I am a PhD student in the Computer Vision Group at the University of Bern, supervised by Prof. Dr. Paolo Favaro. I received my BSc and MSc in Computer Science degrees in 2016 and 2018, respectively, from the University of Bern. The topics I am passionate about include (but are not limited to) Machine Learning, Computer Vision and Computer Graphics. In my PhD thesis I study and develop algorithms for estimating optical flow from two or more video frames. These algorithms are based on deep artificial neural networks that learn an abstract representation of motion from many example videos in an unsupervised way.
Optical Flow, the problem of recovering a vector field that describes the motion in every pixel from one image to the next, as for example in a video, is one of the oldest problems in Computer Vision.
Applications of Optical Flow can be found in almost any system that deals with motion, e.g., in video compression, video frame interpolation (high frame rate), motion segmentation, 3D reconstruction and more.
To this date researchers are trying to develop methods that estimate Optical Flow faster, with greater accuracy or with more robustness to ambiguities.
One major challenge that the prior work tries to address is the estimation of Optical Flow in regions with ambiguity, e.g., regions that are being occluded, disoccluded or have less to no texture.
We believe that with a data-driven approach we can overcome the limitations of prior works and learn to handle the aforementioned challenges.
Since Optical Flow does not naturally emerge as annotation from real datasets, and synthetically generated videos/flows limit the generalization to real data, we must strive towards an unsupervised approach, i.e., we do not want to rely on labelled data.
In this project, we investigate several possible generalizations of Optical Flow that naturally handle occlusions and have subpixel accuracy.
The approach is self-supervised, hence the only training data are frames from high frame rate video recordings and no other annotation is needed.
Structure from motion (SfM) is the problem of reconstructing the 3D geometry and camera parameters given a set of photographs of a scene. State-of-the-art SfM systems assume that all observed motion in the measurements are caused by the camera's motion, and objects that move in the scene are considered noise. Handling this type of noise is indeed one of the main difficulties in SfM. Occlusions, change of lighting and specular reflections are other examples of noise that challenge the robustness of a SfM system. In our research, we consider a temporal sequence of images (video) instead of an unordered set. This makes it suitable for real-time applications where the input is a continuous video stream. We aim to build a system that incrementally outputs the estimated 3D and camera parameters as it reads the video frames one after the other, and are investigating several Deep Learning approaches to solve the aforementioned challenges for this type of sequential data.
|2018-20:||Computer Architecture, Prof. Paolo Favaro|
|2018-20:||Computer Vision, Prof. Paolo Favaro|
|2016:||Computer Graphics, Prof. Matthias Zwicker|