Abdelhak Lemkhenter and Paolo Favaro, in German Conference on Pattern Recognition (GCPR), 2020.
In this work, we introduce Phase-Swap, a novel self-supervised learning task for bio-signals. Most hand-crafted features for bio-signals in general, and EEG in particular, are derived from the power-spetrum, e.g. considering the energy of the signal within predefined frequency bands. In fact, most often the phase information is discared as it is more sample specific, and thus more sensitive to noise, than the amplitude. However, various medical studies have shown the link between the phase component and various physiological patterns such as cognitive functions in the case of brain activity. Motivated by this line of research, we build a self-supervised task that encourages the trained models to learn the implicit phase-amplitude coupling. This task, named Phase Swap, consists of discriminating between real samples and samples for which the phase component in the fourrier domain was swapped out by one taken from another sample. We show that the learned self-supervised features generalize better across experimental settings and subject identities compared to a supervised baseline for two classification tasks, seizure
detection and sleep scoring, on four different dataset: ExpandedEDF (Sleep Cassette + Sleep Telemetry), CHB-MIT and the ISRUC-Sleep data set. These findings highlight the benefits our self-supervised pretraining for various machine learning applications for bio-signals.
Simon Jenni and Paolo Favaro, in Asian Conference on Computer Vision (ACCV), 2020.
Current state of the art methods cast monocular 3D human pose estimation as a learning problem by training neural networks on costly large data sets of images and corresponding skeleton poses. In contrast, we propose an approach that can exploit small annotated data sets by fine-tuning networks pre-trained via self-supervised learning on (large) unlabeled data sets. To drive such models in the pre-training step towards supporting 3D pose estimation, we introduce a novel self-supervised feature learning task designed to focus on the 3D structure in an image. We exploit images extracted from videos captured with a multi-view camera system. The task is to classify whether two images depict two views of the same scene up to a rigid transformation. In a multi-view data set, where objects deform in a non rigid manner, a rigid transformation occurs only between two views taken at the exact same time, i.e., when they are synchronized. We demonstrate the effectiveness of the synchronization task on the Human3.6M data set and achieve state of the art results in 3D human pose estimation.