Date: | Friday, May. 23 |
---|---|
Time: | 14:45 |
Location: | N10_302, Institute of Computer Science |
Our guest speaker is Christian-Alexandru Botocan. He will present his Master Thesis.
You are all cordially invited to the CVG Seminar on May 23rd, 2025 at 2:45 pm CEST
Recent advancements in multi-modal models, like CLIP, have significantly enhanced AI tasks such as image classification, object recognition, and cross-modal retrieval by integrating image and language understanding. Assessing the robustness of Multi-Modal models is an important aspect for the safety of its users. In this talk, we will start with assessing the security of SOTA Multi-Modal models against L0-norm perturbation attacks by altereting less than 0.04% of the image. Then, we continue with the main talk focusing on the robustness of Multi-Modal Foundation Models against backdoor attacks. We will focus on addressing the issues of the current SOTA defence method and propose a new defence by using Task Arithmetic - a model-merging technique. The best proposed defense method incorporates Bayesian Optimization to find the optimal scaling factors of the task vectors representing different fine-tuned models. Our results show that these weighted combinations outperform the current SOTA defense, achieving a favorable balance between Attack Success Rate and Clean Accuracy.
Cristian-Alexandru Botocan recently graduated MSc. in Cybersecurity at EPFL-ETHZ. His academic journey starts with the Bachelor in Computer Science and Engineering at TU Delft where he opted for Data Science specialization, focusing on Reccomandation Systems both in academia and industry, with an internship at Amazon Music ML Team in Berlin. Cristian graduated his Bachelor with "Cum Laude" and also did additional research programme called "Honours Programme", where he was focusing on using AI for Side-Channel Attacks against Cryptographic Protocols. However, his research direction during the Master was in the AI security domain. Cristian did a research internship at armassuise Science + Technology, focusing on exploring the robustness of the Multi-Modal Models against Pixel-Perturbations (https://arxiv.org/pdf/2407.18251). His last research experience is represented by the Master Thesis, where he was focusing on a defence method against backdoor attacks for Multi-Modal Models.
Date: | Friday, Apr. 11 |
---|---|
Time: | 14:45 |
Location: | N10_302, Institute of Computer Science |
Our guest speaker is Omri Avrahami from Hebrew University of Jerusalem.
You are all cordially invited to the CVG Seminar on April 11th, 2025 at 2:45 pm CEST
Classical computer graphics approaches for realistic content synthesis require an elaborate underlying scene representation, which typically describes the geometry and physics of a scene, and specifies lighting and camera position, etc. In contrast, generative neural models learn to synthesize diverse visual content from large image datasets, but typically without providing precise fine-grained control. In our work, we aim at developing new tools for visual content synthesis and editing using generative models by exploring various ways to narrow down this gap between classical content generation techniques and neural data-driven approaches.
Omri is a Computer Science Ph.D. student at the School of Computer Science and Engineering at the Hebrew University of Jerusalem, under the joint supervision of Prof. Dani Lischinski and Prof. Ohad Fried. Omri is interested in developing new tools for content synthesis and editing, known popularly as Generative AI. For his latest research, visit https://omriavrahami.com.
Date: | Friday, Mar. 21 |
---|---|
Time: | 15:00 |
Location: | N10_302, Institute of Computer Science |
Our guest speaker is Abdelhak Lemkhenter from Microsoft, Cambridge.
You are all cordially invited to the CVG Seminar on March 21st, 2025 at 3:00 pm CEST
Over the last few years, the research community has continuously pushed the boundary of
video generative modelling with many impressive demos of open and closed source models.
This led to an increasing interest in the steerability of such models and their ability to capture
the different dynamics present in the data. In this talk, we will discuss the recent advances in
world modelling applied to video games as an interesting setting for training such models. We
will discuss the recently published World and Human Action Model (WHAM) through the lens of
its design, its evaluation and key learning that came of scaling world models to a modern video
game title.
Abdelhak Lemkhenter is a Researcher at Microsoft Research Cambridge currently working on few-
shot imitation learning and world modeling in complex modern video games. His research
interests also include robust and scalable representation learning and data-centric
learning. He completed his PhD in Informatics at the University of Bern and obtained his Master
degree from the Ecole Central de Paris.
Date: | Friday, Jul. 5 |
---|---|
Time: | 14:30 |
Location: | N10_302, Institute of Computer Science |
Our guest speaker is Prof. Anderson Rocha from the University of Campinas (Unicamp), Brazil.
You are all cordially invited to the CVG Seminar on July 5th at 2:30 pm CEST
We explore the burgeoning landscape of synthetic realities (AI-enabled synthetic contents allied with narratives and contexts), detailing their impact, technological advancements, and ethical quandaries. Synthetic realities provide innovative solutions and opportunities for immersive experiences across various sectors, including education, healthcare, and commerce. However, these advancements also usher in substantial challenges, such as the propagation of misinformation, privacy concerns, and ethical dilemmas. In this talk, we discuss the specifics of synthetic media, including deepfakes and their generation techniques, and the imperative need for robust detection methods to combat the potential misuse of such technologies, as well as concerted efforts on regulation, standardization and technological literacy. We show the dual-edged nature of synthetic realities and advocate for interdisciplinary research, informed public discourse, and collaborative efforts to harness their benefits while mitigating risks. This talk contributes to the discourse on the responsible development and application of artificial intelligence and synthetic media in modern society.
Anderson Rocha (IEEE Fellow) is Full-Professor of Artificial Intelligence and Digital Forensics at the Institute of Computing, University of Campinas (Unicamp), Brazil. He is the Head of the Artificial Intelligence Lab., Recod.ai, at Unicamp. He is a three-term elected member of the IEEE Information Forensics and Security Technical Committee (IFS-TC) and a former chair of such committee. He is also chair-elect for the 2025-2026 term. He is a Microsoft Research and a Google Research Faculty Fellow as well as a Tan Chin Tuan (TCT) Fellow. Since 2023, he is also an Asia Pacific Artificial Intelligence Association Fellow. He is ranked among the Top 2% of research scientists worldwide, according to PlosOne/Stanford and Research.com studies. Finally, he is now a LinkedIn Top Voice in Artificial Intelligence for continuously raising awareness of Al and its potential impacts on society at large.
Date: | Friday, Apr. 26 |
---|---|
Time: | 16:00 |
Location: | Online Call via Zoom |
Our guest speaker is Jason Y. Zhang from Carnegie Mellon University.
You are all cordially invited to the CVG Seminar on April 26th at 4 pm CEST
Reconstructing 3D scenes and objects from images alone has been a long-standing goal in computer vision. However, typical methods require a large number of images with precisely calibrated camera poses, which is cumbersome for end users. We propose a probabilistic framework that can predict distributions over relative camera rotations. These distributions are then composed into coherent camera poses given sparse image sets. To improve precision, we then propose a diffusion-based model that represents camera poses as a distribution over rays instead of camera extrinsics. We demonstrate that our system is capable of recovering accurate camera poses from a variety of self-captures and is sufficient for high-quality 3D reconstruction.
Jason Y. Zhang is a final-year PhD student at Carnegie Mellon University, advised by Deva Ramanan and Shubham Tulsiani. Jason completed his undergraduate degree at UC Berkeley, where he worked with Jitendra Malik and Angjoo Kanazawa. He is interested in scaling single-view and multi-view 3D to unconstrained environments. Jason is supported in part by the NSF GRFP.