Posts by Collection



Unsupervised Odometry and Depth Learning for Endoscopic Capsule Robots

Published in IROS, 2018

Mehmet Turan, Evin Pinar Ornek, Nail Ibrahimli, Can Giracoglu, Yasin Almalioglu, Mehmet Fatih Yanik, Metin Sitti

In the last decade, many medical companies and research groups have tried to convert passive capsule endoscopes as an emerging and minimally invasive diagnostic technology into actively steerable endoscopic capsule robots which will provide more intuitive disease detection, targeted drug delivery and biopsy-like operations in the gastrointestinal(GI) tract. In this study, we introduce a fully unsupervised, real-time odometry and depth learner for monocular endoscopic capsule robots. We establish the supervision by warping view sequences and assigning the re-projection minimization to the loss function, which we adopt in multi-view pose estimation and single-view depth estimation network. Detailed quantitative and qualitative analyses of the proposed framework performed on non-rigidly deformable ex-vivo porcine stomach datasets proves the effectiveness of the method in terms of motion estimation and depth recovery.

Download here

Push-the-Boundary: Boundary-aware Feature Propagation for Semantic Segmentation of 3D Point Clouds

Published in 3DV, 2022

Shenglan Du, Nail Ibrahimli, Jantien Stoter, Julian Kooij, Liangliang Nan

Feedforward fully convolutional neural networks currently dominate in semantic segmentation of 3D point clouds.Despite their great success, they suffer from the loss of local information at low-level layers, posing significant challenges to accurate scene segmentation and precise object boundary delineation.Prior works either address this issue by post-processing or jointly learn object boundaries to implicitly improve feature encoding of the networks.These approaches often require additional modules which are difficult to integrate into the original architecture.
To improve the segmentation near object boundaries, we propose a boundary-aware feature propagation mechanism.This mechanism is achieved by exploiting a multi-task learning framework that aims to explicitly guide the boundaries to their original locations.With one shared encoder, our network outputs (i) boundary localization, (ii) prediction of directions pointing to the object’s interior, and (iii) semantic segmentation, in three parallel streams.The predicted boundaries and directions are fused to propagate the learned features to refine the segmentation.We conduct extensive experiments on the S3DIS and SensatUrban datasets against various baseline methods, demonstrating that our proposed approach yields consistent improvements by reducing boundary errors.

Download here

DDL-MVS: Depth Discontinuity Learning for MVS Networks

Published in Remote Sensing, 2023

Nail Ibrahimli, Hugo Ledoux, Julian Kooij, Liangliang Nan

Traditional MVS methods have good accuracy but struggle with completeness, while recently developed learning-based multi-view stereo (MVS) techniques have improved completeness except accuracy being compromised. We propose depth discontinuity learning for MVS methods, which further improves accuracy while retaining the completeness of the reconstruction. Our idea is to jointly estimate the depth and boundary maps where the boundary maps are explicitly used for further refinement of the depth maps. We validate our idea and demonstrate that our strategies can be easily integrated into the existing learning-based MVS pipeline where the reconstruction depends on high-quality depth map estimation. Extensive experiments on various datasets show that our method improves reconstruction quality compared to baseline. Experiments also demonstrate that the presented model and strategies have good generalization capabilities.

Download here

MuVieCAST: Multi-View Consistent Artistic Style Transfer

Published in 3DV, 2024

Nail Ibrahimli, Julian Kooij, Liangliang Nan

We present a modular multi-view consistent style transfer network architecture MuVieCAST that enables consistent style transfer between multiple viewpoints of the same scene. This network architecture supports both sparse and dense views, making it versatile enough to handle a wide range of multi-view image datasets. The approach consists of three modules that perform specific tasks related to style transfer, namely content preservation, image transformation, and multi-view consistency enforcement. We evaluate our approach extensively across multiple application domains including depth-map-based point cloud fusion, mesh reconstruction, and novel-view synthesis. The results demonstrate that the framework produces high-quality stylized images while maintaining consistency across multiple views, even for complex styles that involve mosaic tessellations or extensive brush strokes. Our modular multi-view consistent style transfer framework is extensible and can easily be integrated with various backbone architectures, making it a flexible solution for multi-view style transfer. Project page:

Download here


Paper presentation: Manhattan SDF


Manhattan SDF addresses the challenge of reconstructing 3D indoor scenes from multi-view images. Many previous works have shown impressive reconstruction results on textured objects, but they still have difficulty in handling low-textured planar regions, which are common in indoor scenes. An approach to solving this issue is to incorporate planar constraints into the depth map estimation in multi-view stereo-based methods, but the per-view plane estimation and depth optimization lack both efficiency and multi-view consistency. In this work, authors show that the planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods. Specifically, the work uses an MLP network to represent the signed distance function as the scene geometry. Based on the Manhattan-world assumption, planar constraints are employed to regularize the geometry in floor and wall regions predicted by a 2D semantic segmentation network. To resolve the inaccurate segmentation, Manhattan SDF encodes the semantics of 3D points with another MLP and design a novel loss that jointly optimizes the scene geometry and semantics in 3D space. Experiments on ScanNet and 7-Scenes datasets show that the proposed method outperforms previous methods by a large margin on 3D reconstruction quality.

Project page can be found in this link

Volumetric differentiable rendering


Lately, learning-based 3D reconstruction methods have shown impressive results. There is a line of research in the last couple of years that unlike most traditional and other learning-based methods does not require 3D supervision which is often hard to obtain for real-world datasets. Recently, several works have proposed differentiable rendering techniques to train reconstruction models from RGB images. The approaches that are restricted to voxel- and mesh-based representations, suffer from discretization or low resolution. In this webinar, we will talk about a differentiable rendering formulation for implicit shape and texture representations. Implicit representations have recently gained popularity as they represent shape and texture continuously. I will present the work DVR (Niemeyer et al. 2020) which shows that depth gradients can be derived analytically using the concept of implicit differentiation. This allows neural networks to learn implicit shape and texture representations directly from RGB images. DVR can be used for multi-view 3D reconstruction, directly resulting in watertight meshes.

Join Zoom Meeting via this link

Recent development in learning-based 3D Reconstruction


Since 2015, there has been ongoing research on stereo learning. Siamese networks were initially employed to densely match the patches. Cost volume regularization-based stereo techniques have become more common since 2017. Reconstructing more complete 3D models was made possible by learning-based multi view stereo.Approaches based on differentiable rendering and neural rendering have recently gained in popularity. Utilizing positional encoding and volumetric rendering, it was feasible to reconstruct surfaces with non-Lambertian surfaces in addition to synthesizing novel viewpoints.We will go through the both paradigms’ most recent developments (stereo-based and neural rendering-based) during this talk.


Photogrammetry and 3D Computer Vision Permalink

Graduate course, Delft University of Technology, 2020

Liangliang Nan, Nail Ibrahimli
2020-2021 Q3, 2021-2022 Q4, 2022-2023 Q4

Photogrammetry and 3D Computer Vision (i.e., 3DV) aim at recovering the structure of real-world objects/scenes from images. This course is about the theories, methodologies, and techniques of 3D computer vision for the built environment. In the term of this course, you will learn the basic knowledge and algorithms in 3D computer vision through a series of lectures, reading materials, lab exercises, and group assignments.

Machine Learning for the Built Environment Permalink

Graduate course, Delft University of Technology, 2021

Liangliang Nan, Shenglan Du, Nail Ibrahimli
2021-2022 Q3, 2022-2023 Q3

This course is introductory for machine learning to equip students with the basic knowledge and skills for further study and research in machine learning. It introduces the theory/methods of well-established machine learning and a few state-of-the-art deep learning techniques for processing geospatial data (e.g., point clouds). Students will also gain hands-on experiences by applying commonly used machine learning techniques to solve practical problems through a series of lab exercises and assignments.