April 12 Colloquium: "Learning to Understand Our Multimodal World with Minimal Supervision"

Talk Abstract

The field of computer vision is undergoing another profound change. Recently, "generalist" models have emerged that can solve a variety of different visual perception tasks. Also known as foundation models, they are trained on huge internet-scale unlabeled or weakly-labeled data, and can adapt to new tasks without any additional supervision or with just a small number of manually labeled samples. Moreover, some are multimodal: they understand both language and images, and can support other perceptual modes as well.

In this talk, I will present our recent research on creating AI systems that can learn to understand our multimodal world with minimal human supervision. I will focus on systems that can understand images and text, and also touch upon those that utilize video, audio, and lidar. Since training foundation models from scratch can be prohibitively expensive, I will discuss how to efficiently repurpose existing foundation models for use in application-specific tasks. I will also discuss how these models can be used for image generation and, in turn, for detecting AI-generated images. I’ll conclude by highlighting key remaining challenges and promising research directions.

Biography

Yong Jae Lee is an Associate Professor in the Department of Computer Sciences at the University of Wisconsin-Madison. His research interests are in computer vision and machine learning, with a focus on creating robust visual recognition systems that can learn to understand the visual world with minimal human supervision. Before joining UW-Madison in 2021, he spent one year as an AI Visiting Faculty at Cruise, and before that, six years as an Assistant and then Associate Professor at UC Davis. He received his Ph.D. from the University of Texas at Austin in 2012 advised by Kristen Grauman, and was a postdoc at Carnegie Mellon University (2012-2013) and UC Berkeley (2013-2014) advised by Alyosha Efros. He is a recipient of the ARO Young Investigator Program Award (2017), UC Davis Hellman Fellowship (2017), NSF CAREER Award (2018), AWS Machine Learning Research Awards (2018, 2019), Adobe Data Science Research Awards (2019, 2022), UC Davis College of Engineering Outstanding Junior Faculty Award (2019), Sony Focused Research Awards (2020, 2023), and UW-Madison SACM Student Choice Professor of the Year Award (2022). He and his collaborators received the Most Innovative Award at the COCO Object Detection Challenge ICCV 2019 and the Best Paper Award at BMVC 2020.

Website: https://pages.cs.wisc.edu/~yongjaelee/

Location

Sennott Square Building, Room 5317

Date

Friday, April 12 at 2:00 p.m. to 3:15 p.m.

Faculty Host

Dr. Adriana Kovashka

April 12 Colloquium: "Learning to Understand Our Multimodal World with Minimal Supervision"

Share:

Talk Abstract

Biography

Location

Date

Faculty Host

News Type

Key Department Contacts

Resources