News

Latest publications in NeurIPS 2024

Nov. 25, 2024

A paper from our group got accepted to NeurIPS 2024!

Blind Image Restoration via Fast Diffusion Inversion

Hamadi Chihaoui, Abdelhak Lemkhenter and Paolo Favaro, in the 38th Annual Conference on Neural Information Processing Systems (NeurIPS), 2024.

Image Restoration (IR) methods based on a pre-trained diffusion model have demonstrated state-of-the-art performance. However, they have two fundamental limitations: 1) they often assume that the degradation operator is completely known and 2) they alter the diffusion sampling process, which may result in restored images that do not lie onto the data manifold. To address these issues, we propose Blind Image Restoration via fast Diffusion inversion (BIRD) a blind IR method that jointly optimizes for the degradation model parameters and the restored image. To ensure that the restored images lie onto the data manifold, we propose a novel sampling technique on a pre-trained diffusion model. A key idea in our method is not to modify the reverse sampling, i.e., not to alter all the intermediate latents, once an initial noise is sampled. This is ultimately equivalent to casting the IR task as an optimization problem in the space of the input noise. Moreover, to mitigate the computational cost associated with inverting a fully unrolled diffusion model, we leverage the inherent capability of these models to skip ahead in the forward diffusion process using large time steps. We experimentally validate BIRD on several image restoration tasks and show that it achieves state of the art performance.

Project page: https://hamadichihaoui.github.io/BIRD/

Paper: https://arxiv.org/abs/2405.19572

Latest publications in CVPR 2024

June 16, 2024

A paper from our group got accepted to CVPR 2024!

Masked and Shuffled Blind Spot Denoising for Real-World Images

Hamadi Chihaoui and Paolo Favaro, in the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

We introduce a novel approach to single image denoising based on the Blind Spot Denoising principle, which we call MAsked and SHuffled Blind Spot Denoising (MASH). We focus on the case of correlated noise, which often plagues real images. MASH is the result of a careful analysis to determine the relationships between the level of blindness (masking) of the input and the (unknown) noise correlation. Moreover, we introduce a shuffling technique to weaken the local correlation of noise, which in turn yields an additional denoising performance improvement. We evaluate MASH via extensive experiments on real-world noisy image datasets. We demonstrate on par or better results compared to existing self-supervised denoising methods.

Paper: https://arxiv.org/abs/2404.09389

Latest publications in AAAI 2024

Feb. 20, 2024

A paper from our group got accepted to AAAI 2024!

Learn the Force We Can: Enabling Sparse Motion Control in Multi-Object Video Generation

Aram Davtyan and Paolo Favaro, in AAAI Conference on Artificial Intelligence, 2024

We propose a novel unsupervised method to autoregressively generate videos from a single frame and a sparse motion input. Our trained model can generate unseen realistic object-to-object interactions. Although our model has never been given the explicit segmentation and motion of each object in the scene during training, it is able to implicitly separate their dynamics and extents. Key components in our method are the randomized conditioning scheme, the encoding of the input motion control, and the randomized and sparse sampling to enable generalization to out of distribution but realistic correlations. Our model, which we call YODA, has therefore the ability to move objects without physically touching them. Through extensive qualitative and quantitative evaluations on several datasets, we show that YODA is on par with or better than state of the art video generation prior work in terms of both controllability and video quality.

Project website: https://araachie.github.io/yoda.

Paper: https://arxiv.org/abs/2306.03988

Latest publications in ICCV 2023

Aug. 21, 2023

Two papers from our group got accepted to ICCV 2023!

Efficient Video Prediction via Sparsely Conditioned Flow Matching

Aram Davtyan*, Sepehr Sameni* and Paolo Favaro, in IEEE International Conference on Computer Vision (ICCV 2023)

We introduce a novel generative model for video prediction based on latent flow matching, an efficient alternative to diffusion-based models. In contrast to prior work, we keep the high costs of modeling the past during training and inference at bay by conditioning only on a small random set of past frames at each integration step of the image generation process. Moreover, to enable the generation of high-resolution videos and to speed up the training, we work in the latent space of a pretrained VQGAN. Finally, we propose to approximate the initial condition of the flow ODE with the previous noisy frame. This allows to reduce the number of integration steps and hence, speed up the sampling at inference time. We call our model Random frame conditioned flow Integration for VidEo pRedition, or, in short, RIVER. We show that RIVER achieves superior or on par performance compared to prior work on common video prediction benchmarks, while requiring an order of magnitude fewer computational resources.

Project website: https://araachie.github.io/river.

Paper: https://arxiv.org/abs/2211.14575

Spatio-Temporal Crop Aggregation for Video Representation Learning

Sepehr Sameni, Simon Jenni and Paolo Favaro, in IEEE International Conference on Computer Vision (ICCV 2023)

We propose Spatio-temporal Crop Aggregation for video representation LEarning (SCALE), a novel method that enjoys high scalability at both training and inference time. Our model builds long-range video features by learning from sets of video clip-level features extracted with a pretrained backbone. To train the model, we propose a self-supervised objective consisting of masked clip feature predictions. We apply sparsity to both the input, by extracting a random set of video clips, and to the loss function, by only reconstructing the sparse inputs. Moreover, we use dimensionality reduction by working in the latent space of a pre-trained backbone applied to single video clips. These techniques make our method not only extremely efficient to train but also highly effective in transfer learning. We demonstrate that our video representation yields state-of-the-art performance with linear, nonlinear, and k-NN probing on common action classification and video understanding datasets.

Paper: https://arxiv.org/abs/2211.17042

Latest publication in NeurIPS 2022

Sept. 14, 2022

Our paper on unsupervised object segmentation got accepted to NeurIPS 2022!

MOVE: Unsupervised Movable Object Segmentation and Detection

Adam Bielski and Paolo Favaro, in 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

We introduce MOVE, a novel method to segment objects without any form of supervision. MOVE exploits the fact that foreground objects can be shifted locally relative to their initial position and result in realistic (undistorted) new images. This property allows us to train a segmentation model on a dataset of images without annotation and to achieve state of the art (SotA) performance on several evaluation datasets for unsupervised salient object detection and segmentation. In unsupervised single object discovery, MOVE gives an average CorLoc improvement of 7.2% over the SotA, and in unsupervised class-agnostic object detection it gives a relative AP improvement of 53% on average. Our approach is built on top of self-supervised features (e.g. from DINO or MAE), an inpainting network (based on the Masked AutoEncoder) and adversarial training.

Paper: https://arxiv.org/abs/2210.07920

begin
1
2(current)
3
4
end