Seminars and Talks

Segmenting Objects without Manual Supervision
by Laurynas Karazija
Date: Friday, Jan. 12
Time: 14:30
Location: Online Call via Zoom

Our guest speaker is Laurynas Karazija from the Visual Geometry Group, University of Oxford.

You are all cordially invited to the CVG Seminar on January 12th at 2:30 pm CET

  • via Zoom (passcode is 043728).

Abstract

Detecting, localising and representing objects comprising the visual world is an important and interesting problem with many downstream applications. Today's systems are supervised, relying on extensive and expensive manual annotations. In this talk, I will introduce some recent works that explore learning from appearance, motion and language in an unsupervised or weakly-supervised manner. In particular, I will focus on the drawbacks of appearance-based object-centric models, explain how to teach segmentation networks using optical flow in an end-to-end manner and show how pretrained generative diffusion models can be used to synthesise segmenters directly by sampling and representing objects and their context.

Bio

Laurynas Karazija is a PhD student at the Visual Geometry Group at the University of Oxford, UK, working with Prof Andrea Vedaldi, Prof Christian Rupprecht and Dr Iro Laina. He focuses on learning to understand and decompose the visual world into distinct objects with as little supervision as possible.

Three Views on View Synthesis
by Kyle Sargent
Date: Friday, Dec. 15
Time: 16:00
Location: Online Call via Zoom

Our guest speaker is Kyle Sargent from Stanford Vision Lab.

You are all cordially invited to the CVG Seminar on December 15th at 4 pm CET

  • via Zoom (passcode is 520944).

Abstract

Novel view synthesis from a single image is an important problem in computer vision. Several sources of randomness and ill-posedness make the problem extremely challenging. I will present three papers from over the course of my research career, each taking a very different perspective and technical approach to this problem. As the talk progresses, I will explain how I have come to regard 3D generative modeling and 3D novel view synthesis as closely connected, and give supporting evidence. The final paper I will present is ZeroNVS: Zero-shot 360-degree View Synthesis from a Single Real Image, my most recent paper, which is currently in submission.

Bio

Kyle Sargent is a second year PhD student in the Stanford Vision Lab, advised by Jiajun Wu and Fei-Fei Li. He works on 3D generative models and novel view synthesis. He has written several papers for top vision conferences. This includes two first or co-first authored Best Paper Finalists, at CVPR2022 and ICCV2023. Prior to joining Stanford, he was an AI Resident at Google Research, and prior to that, he was an undergraduate at Harvard.

Using Deep Generative Models for Representation Learning and Beyond
by Daiqing Li
Date: Thursday, Dec. 7
Time: 16:00
Location: Online Call via Zoom

Our guest speaker is Daiqing Li from Playground.

You are all cordially invited to the CVG Seminar on December 7th at 4 pm CET

  • via Zoom (passcode is 102781).

Abstract

Diffusion-based deep generative models have demonstrated remarkable performance in text condition synthesis tasks in images, videos, and 3D. In this talk, I will talk about how to use large-scale T2I models as vision foundation models for representation learning and other downstream tasks, such as synthetic dataset generation and semantic segmentation.

Bio

Daiqing Li is currently serving as a research lead at Playground, where their primary focus lies in advancing the realm of pixel foundation models. Previously, Daiqing held the position of senior research scientist at the NVIDIA Toronto AI Lab. In this capacity, their research encompassed a broad spectrum, including computer vision, computer graphics, generative models, and machine learning. He collaborates closely with Sanja Fidler and Antonio Torralba in NVIDIA and several of his works have been integrated into NVIDIA products, notably Omniverse and Clara. Daiqing graduated from the University of Toronto and has been recognized as the runner-up for the MICCAI Young Scientist Awards. His recent research focuses on using generative models for dataset synthesis, perception tasks, and representation learning. He is the author of SemanticGAN, BigDatasetGAN, and DreamTeacher.

Event-based optical flow and stereo depth estimation using contrast maximization
by Guillermo Gallego
Date: Monday, Apr. 17
Time: 09:00
Location: N10_302, Institute of Computer Science

Our guest speaker is Dr. Guillermo Gallego from TU Berlin.

You are all cordially invited to the CVG Seminar on April 17th at 9 am CET

Abstract

Event cameras are novel vision sensors that mimic functions from the human retina and offer potential advantages over traditional cameras (low latency, high speed, high dynamic range, etc.). They acquire visual information in the form of pixel-wise brightness changes, called events. This talk presents event processing approaches for motion estimation in computer vision and robotics applications. In particular, we will discuss recent advances by the Robotic Interactive Perception Lab at TU Berlin in extending the contrast maximization framework to optical flow and stereo depth estimation.

Bio

Guillermo Gallego is Associate Professor at TU Berlin and the Einstein Center Digital Future, Berlin, Germany. He is also a PI of the Science of Intelligence Excellence Cluster. He received the PhD degree in Electrical and Computer Engineering from the Georgia Institute of Technology, USA, in 2011. From 2011 to 2014 he was a Marie Curie researcher with Universidad Politecnica de Madrid, Spain, and from 2014 to 2019 he was a postdoctoral researcher with the Robotics and Perception Group at the University of Zurich, Switzerland. He serves as Associate Editor for IEEE Trans. on Pattern Analysis and Machine Intelligence, IEEE Robotics and Automation Letters and the International Journal of Robotics Research.

Understanding Long Videos with Minimal Supervision
by Tengda Han
Date: Friday, Mar. 17
Time: 15:00
Location: Online Call via Zoom

Our guest speaker is Tengda Han from the Visual Geometry Group (VGG), University of Oxford

You are all cordially invited to the CVG's Seminar on the 17th of March at 3:00 pm CET

  • via Zoom (passcode is 690015).

Abstract

Videos are an appealing data source to train computer vision models. There exist almost infinite supplies of videos online, but exhaustive manual annotation is infeasible. In this talk, I will briefly introduce a few methods to learn strong video representations with minimal human annotations, with an emphasis on long videos which go beyond a few seconds.

 

Bio

Tengda Han is a post-doctoral research fellow at the Visual Geometry Group at the University of Oxford. He obtained his PhD from the same group in 2022 supervised by Andrew Zisserman. His current research focuses on self-supervised learning, efficient learning and video understanding.