Events
Open full Events browserLoading...
Live and recorded talks from the researchers shaping this domain.
FLUXSynID: High-Resolution Synthetic Face Generation for Document and Live Capture Images
Synthetic face datasets are increasingly used to overcome the limitations of real-world biometric data, including privacy concerns, demographic imbalance, and high collection costs. However, many existing methods lack fine-grained control over identity attributes and fail to produce paired, identity-consistent images under structured capture conditions. In this talk, I will present FLUXSynID, a framework for generating high-resolution synthetic face datasets with user-defined identity attribute distributions and paired document-style and trusted live capture images. The dataset generated using FLUXSynID shows improved alignment with real-world identity distributions and greater diversity compared to prior work. I will also discuss how FLUXSynID’s dataset and generation tools can support research in face recognition and morphing attack detection (MAD), enhancing model robustness in both academic and practical applications.
Speaker
Raul Ismayilov • University of Twente
Scheduled for
Jul 1, 2025, 2:00 PM
Timezone
GMT+1
Deepfake emotional expressions trigger the uncanny valley brain response, even when they are not recognised as fake
Facial expressions are inherently dynamic, and our visual system is sensitive to subtle changes in their temporal sequence. However, researchers often use dynamic morphs of photographs—simplified, linear representations of motion—to study the neural correlates of dynamic face perception. To explore the brain's sensitivity to natural facial motion, we constructed a novel dynamic face database using generative neural networks, trained on a verified set of video-recorded emotional expressions. The resulting deepfakes, consciously indistinguishable from videos, enabled us to separate biological motion from photorealistic form. Results showed that conventional dynamic morphs elicit distinct responses in the brain compared to videos and photos, suggesting they violate expectations (n400) and have reduced social salience (late positive potential). This suggests that dynamic morphs misrepresent facial dynamism, resulting in misleading insights about the neural and behavioural correlates of face perception. Deepfakes and videos elicited largely similar neural responses, suggesting they could be used as a proxy for real faces in vision research, where video recordings cannot be experimentally manipulated. And yet, despite being consciously undetectable as fake, deepfakes elicited an expectation violation response in the brain. This points to a neural sensitivity to naturalistic facial motion, beyond conscious awareness. Despite some differences in neural responses, the realism and manipulability of deepfakes make them a valuable asset for research where videos are unfeasible. Using these stimuli, we proposed a novel marker for the conscious perception of naturalistic facial motion – Frontal delta activity – which was elevated for videos and deepfakes, but not for photos or dynamic morphs.
Speaker
Casey Becker • University of Pittsburgh
Scheduled for
Apr 15, 2025, 4:00 PM
Timezone
GMT+1
Comparing supervised learning dynamics: Deep neural networks match human data efficiency but show a generalisation lag
Recent research has seen many behavioral comparisons between humans and deep neural networks (DNNs) in the domain of image classification. Often, comparison studies focus on the end-result of the learning process by measuring and comparing the similarities in the representations of object categories once they have been formed. However, the process of how these representations emerge—that is, the behavioral changes and intermediate stages observed during the acquisition—is less often directly and empirically compared. In this talk, I'm going to report a detailed investigation of the learning dynamics in human observers and various classic and state-of-the-art DNNs. We develop a constrained supervised learning environment to align learning-relevant conditions such as starting point, input modality, available input data and the feedback provided. Across the whole learning process we evaluate and compare how well learned representations can be generalized to previously unseen test data. Comparisons across the entire learning process indicate that DNNs demonstrate a level of data efficiency comparable to human learners, challenging some prevailing assumptions in the field. However, our results also reveal representational differences: while DNNs' learning is characterized by a pronounced generalisation lag, humans appear to immediately acquire generalizable representations without a preliminary phase of learning training set-specific information that is only later transferred to novel data.
Speaker
Lukas Huber • University of Bern
Scheduled for
Sep 22, 2024, 10:30 AM
Timezone
GMT+1
Error Consistency between Humans and Machines as a function of presentation duration
Within the last decade, Deep Artificial Neural Networks (DNNs) have emerged as powerful computer vision systems that match or exceed human performance on many benchmark tasks such as image classification. But whether current DNNs are suitable computational models of the human visual system remains an open question: While DNNs have proven to be capable of predicting neural activations in primate visual cortex, psychophysical experiments have shown behavioral differences between DNNs and human subjects, as quantified by error consistency. Error consistency is typically measured by briefly presenting natural or corrupted images to human subjects and asking them to perform an n-way classification task under time pressure. But for how long should stimuli ideally be presented to guarantee a fair comparison with DNNs? Here we investigate the influence of presentation time on error consistency, to test the hypothesis that higher-level processing drives behavioral differences. We systematically vary presentation times of backward-masked stimuli from 8.3ms to 266ms and measure human performance and reaction times on natural, lowpass-filtered and noisy images. Our experiment constitutes a fine-grained analysis of human image classification under both image corruptions and time pressure, showing that even drastically time-constrained humans who are exposed to the stimuli for only two frames, i.e. 16.6ms, can still solve our 8-way classification task with success rates way above chance. We also find that human-to-human error consistency is already stable at 16.6ms.
Speaker
Thomas Klein • Eberhard Karls Universität Tübingen
Scheduled for
Jun 30, 2024, 10:30 AM
Timezone
GMT+1
Trends in NeuroAI - Meta's MEG-to-image reconstruction
Trends in NeuroAI is a reading group hosted by the MedARC Neuroimaging & AI lab (https://medarc.ai/fmri). Title: Brain-optimized inference improves reconstructions of fMRI brain activity Abstract: The release of large datasets and developments in AI have led to dramatic improvements in decoding methods that reconstruct seen images from human brain activity. We evaluate the prospect of further improving recent decoding methods by optimizing for consistency between reconstructions and brain activity during inference. We sample seed reconstructions from a base decoding method, then iteratively refine these reconstructions using a brain-optimized encoding model that maps images to brain activity. At each iteration, we sample a small library of images from an image distribution (a diffusion model) conditioned on a seed reconstruction from the previous iteration. We select those that best approximate the measured brain activity when passed through our encoding model, and use these images for structural guidance during the generation of the small library in the next iteration. We reduce the stochasticity of the image distribution at each iteration, and stop when a criterion on the "width" of the image distribution is met. We show that when this process is applied to recent decoding methods, it outperforms the base decoding method as measured by human raters, a variety of image feature metrics, and alignment to brain activity. These results demonstrate that reconstruction quality can be significantly improved by explicitly aligning decoding distributions to brain activity distributions, even when the seed reconstruction is output from a state-of-the-art decoding algorithm. Interestingly, the rate of refinement varies systematically across visual cortex, with earlier visual areas generally converging more slowly and preferring narrower image distributions, relative to higher-level brain areas. Brain-optimized inference thus offers a succinct and novel method for improving reconstructions and exploring the diversity of representations across visual brain areas. Speaker: Reese Kneeland is a Ph.D. student at the University of Minnesota working in the Naselaris lab. Paper link: https://arxiv.org/abs/2312.07705
Speaker
Reese Kneeland
Scheduled for
Jan 4, 2024, 11:00 AM
Timezone
EDT
Trends in NeuroAI - Meta's MEG-to-image reconstruction
Trends in NeuroAI is a reading group hosted by the MedARC Neuroimaging & AI lab (https://medarc.ai/fmri). This will be an informal journal club presentation, we do not have an author of the paper joining us. Title: Brain decoding: toward real-time reconstruction of visual perception Abstract: In the past five years, the use of generative and foundational AI systems has greatly improved the decoding of brain activity. Visual perception, in particular, can now be decoded from functional Magnetic Resonance Imaging (fMRI) with remarkable fidelity. This neuroimaging technique, however, suffers from a limited temporal resolution (≈0.5 Hz) and thus fundamentally constrains its real-time usage. Here, we propose an alternative approach based on magnetoencephalography (MEG), a neuroimaging device capable of measuring brain activity with high temporal resolution (≈5,000 Hz). For this, we develop an MEG decoding model trained with both contrastive and regression objectives and consisting of three modules: i) pretrained embeddings obtained from the image, ii) an MEG module trained end-to-end and iii) a pretrained image generator. Our results are threefold: Firstly, our MEG decoder shows a 7X improvement of image-retrieval over classic linear decoders. Second, late brain responses to images are best decoded with DINOv2, a recent foundational image model. Third, image retrievals and generations both suggest that MEG signals primarily contain high-level visual features, whereas the same approach applied to 7T fMRI also recovers low-level features. Overall, these results provide an important step towards the decoding - in real time - of the visual processes continuously unfolding within the human brain. Speaker: Dr. Paul Scotti (Stability AI, MedARC) Paper link: https://arxiv.org/abs/2310.19812
Speaker
Paul Scotti
Scheduled for
Dec 6, 2023, 11:00 AM
Timezone
EDT
Diverse applications of artificial intelligence and mathematical approaches in ophthalmology
Ophthalmology is ideally placed to benefit from recent advances in artificial intelligence. It is a highly image-based specialty and provides unique access to the microvascular circulation and the central nervous system. This talk will demonstrate diverse applications of machine learning and deep learning techniques in ophthalmology, including in age-related macular degeneration (AMD), the leading cause of blindness in industrialized countries, and cataract, the leading cause of blindness worldwide. This will include deep learning approaches to automated diagnosis, quantitative severity classification, and prognostic prediction of disease progression, both from images alone and accompanied by demographic and genetic information. The approaches discussed will include deep feature extraction, label transfer, and multi-modal, multi-task training. Cluster analysis, an unsupervised machine learning approach to data classification, will be demonstrated by its application to geographic atrophy in AMD, including exploration of genotype-phenotype relationships. Finally, mediation analysis will be discussed, with the aim of dissecting complex relationships between AMD disease features, genotype, and progression.
Speaker
Tiarnán Keenan • National Eye Institute (NEI)
Scheduled for
Jun 5, 2023, 3:00 PM
Timezone
GMT
Euclidean coordinates are the wrong prior for primate vision
The mapping from the visual field to V1 can be approximated by a log-polar transform. In this domain, scale is a left-right shift, and rotation is an up-down shift. When fed into a standard shift-invariant convolutional network, this provides scale and rotation invariance. However, translation invariance is lost. In our model, this is compensated for by multiple fixations on an object. Due to the high concentration of cones in the fovea with the dropoff of resolution in the periphery, fully 10 degrees of visual angle take up about half of V1, with the remaining 170 degrees (or so) taking up the other half. This layout provides the basis for the central and peripheral pathways. Simulations with this model closely match human performance in scene classification, and competition between the pathways leads to the peripheral pathway being used for this task. Remarkably, in spite of the property of rotation invariance, this model can explain the inverted face effect. We suggest that the standard method of using image coordinates is the wrong prior for models of primate vision.
Speaker
Gary Cottrell • University of California, San Diego (UCSD)
Scheduled for
May 9, 2023, 11:00 AM
Timezone
EDT
Understanding and Mitigating Bias in Human & Machine Face Recognition
With the increasing use of automated face recognition (AFR) technologies, it is important to consider whether these systems not only perform accurately, but also equitability or without “bias”. Despite rising public, media, and scientific attention to this issue, the sources of bias in AFR are not fully understood. This talk will explore how human cognitive biases may impact our assessments of performance differentials in AFR systems and our subsequent use of those systems to make decisions. We’ll also show how, if we adjust our definition of what a “biased” AFR algorithm looks like, we may be able to create algorithms that optimize the performance of a human+algorithm team, not simply the algorithm itself.
Speaker
John Howard • Maryland Test Facility
Scheduled for
Apr 11, 2023, 4:00 PM
Timezone
GMT+1
Learning to see stuff
Humans are very good at visually recognizing materials and inferring their properties. Without touching surfaces, we can usually tell what they would feel like, and we enjoy vivid visual intuitions about how they typically behave. This is impressive because the retinal image that the visual system receives as input is the result of complex interactions between many physical processes. Somehow the brain has to disentangle these different factors. I will present some recent work in which we show that an unsupervised neural network trained on images of surfaces spontaneously learns to disentangle reflectance, lighting and shape. However, the disentanglement is not perfect, and we find that as a result the network not only predicts the broad successes of human gloss perception, but also the specific pattern of errors that humans exhibit on an image-by-image basis. I will argue this has important implications for thinking about appearance and vision more broadly.
Speaker
Roland W. Fleming • Giessen University
Scheduled for
Mar 12, 2023, 2:00 PM
Timezone
GMT+1
Automated generation of face stimuli: Alignment, features and face spaces
I describe a well-tested Python module that does automated alignment and warping of faces images, and some advantages over existing solutions. An additional tool I’ve developed does automated extraction of facial features, which can be used in a number of interesting ways. I illustrate the value of wavelet-based features with a brief description of 2 recent studies: perceptual in-painting, and the robustness of the whole-part advantage across a large stimulus set. Finally, I discuss the suitability of various deep learning models for generating stimuli to study perceptual face spaces. I believe those interested in the forensic aspects of face perception may find this talk useful.
Speaker
Carl Gaspar • Zayed University (UAE)
Scheduled for
Jan 31, 2023, 2:00 PM
Timezone
GMT+1
Mouse visual cortex as a limited resource system that self-learns an ecologically-general representation
Studies of the mouse visual system have revealed a variety of visual brain areas in a roughly hierarchical arrangement, together with a multitude of behavioral capacities, ranging from stimulus-reward associations, to goal-directed navigation, and object-centric discriminations. However, an overall understanding of the mouse’s visual cortex organization, and how this organization supports visual behaviors, remains unknown. Here, we take a computational approach to help address these questions, providing a high-fidelity quantitative model of mouse visual cortex. By analyzing factors contributing to model fidelity, we identified key principles underlying the organization of mouse visual cortex. Structurally, we find that comparatively low-resolution and shallow structure were both important for model correctness. Functionally, we find that models trained with task-agnostic, unsupervised objective functions, based on the concept of contrastive embeddings were substantially better than models trained with supervised objectives. Finally, the unsupervised objective builds a general-purpose visual representation that enables the system to achieve better transfer on out-of-distribution visual, scene understanding and reward-based navigation tasks. Our results suggest that mouse visual cortex is a low-resolution, shallow network that makes best use of the mouse’s limited resources to create a light-weight, general-purpose visual system – in contrast to the deep, high-resolution, and more task-specific visual system of primates.
Speaker
Aran Nayebi • MIT
Scheduled for
Nov 1, 2022, 5:00 PM
Timezone
GMT+1
Computational Imaging: Augmenting Optics with Algorithms for Biomedical Microscopy and Neural Imaging
Computational imaging seeks to achieve novel capabilities and overcome conventional limitations by combining optics and algorithms. In this seminar, I will discuss two computational imaging technologies developed in Boston University Computational Imaging Systems lab, including Intensity Diffraction Tomography and Computational Miniature Mesoscope. In our intensity diffraction tomography system, we demonstrate 3D quantitative phase imaging on a simple LED array microscope. We develop both single-scattering and multiple-scattering models to image complex biological samples. In our Computational Miniature Mesoscope, we demonstrate single-shot 3D high-resolution fluorescence imaging across a wide field-of-view in a miniaturized platform. We develop methods to characterize 3D spatially varying aberrations and physical simulator-based deep learning strategies to achieve fast and accurate reconstructions. Broadly, I will discuss how synergies between novel optical instrumentation, physical modeling, and model- and learning-based computational algorithms can push the limits in biomedical microscopy and neural imaging.
Speaker
Lei Tian • Department of Electrical and Computer Engineering, Boston University
Scheduled for
Aug 21, 2022, 11:00 AM
Timezone
GMT-3
Learning with less labels for medical image segmentation
Accurate segmentation of medical images is a key step in developing Computer-Aided Diagnosis (CAD) and automating various clinical tasks such as image-guided interventions. The success of state-of-the-art methods for medical image segmentation is heavily reliant upon the availability of a sizable amount of labelled data. If the required quantity of labelled data for learning cannot be reached, the technology turns out to be fragile. The principle of consensus tells us that as humans, when we are uncertain how to act in a situation, we tend to look to others to determine how to respond. In this webinar, Dr Mehrtash Harandi will show how to model the principle of consensus to learn to segment medical data with limited labelled data. In doing so, we design multiple segmentation models that collaborate with each other to learn from labelled and unlabelled data collectively.
Speaker
Mehrtash Harandi • Monash University
Scheduled for
Aug 2, 2022, 12:30 PM
Timezone
GMT+11
Feedforward and feedback processes in visual recognition
Progress in deep learning has spawned great successes in many engineering applications. As a prime example, convolutional neural networks, a type of feedforward neural networks, are now approaching – and sometimes even surpassing – human accuracy on a variety of visual recognition tasks. In this talk, however, I will show that these neural networks and their recent extensions exhibit a limited ability to solve seemingly simple visual reasoning problems involving incremental grouping, similarity, and spatial relation judgments. Our group has developed a recurrent network model of classical and extra-classical receptive field circuits that is constrained by the anatomy and physiology of the visual cortex. The model was shown to account for diverse visual illusions providing computational evidence for a novel canonical circuit that is shared across visual modalities. I will show that this computational neuroscience model can be turned into a modern end-to-end trainable deep recurrent network architecture that addresses some of the shortcomings exhibited by state-of-the-art feedforward networks for solving complex visual reasoning tasks. This suggests that neuroscience may contribute powerful new ideas and approaches to computer science and artificial intelligence.
Speaker
Thomas Serre • Brown University
Scheduled for
Jun 21, 2022, 5:00 PM
Timezone
GMT+1
Measuring the Motions of Mice: Open source tracking with the KineMouse Wheel
Who says you can't reinvent the wheel?! This running wheel for head-fixed mice allows 3D reconstruction of body kinematics using a single camera and DeepLabCut (or similar) software. A lightweight, transparent polycarbonate floor and a mirror mounted on the inside allow two views to be captured simultaneously. All parts are commercially available or laser cut
Speaker
Jimmy Tabet • Department of Biomedical Engineering UNC/NCSU
Scheduled for
May 17, 2022, 12:00 PM
Timezone
GMT-3
PiSpy: An Affordable, Accessible, and Flexible Imaging Platform for the Automated Observation of Organismal Biology and Behavior
A great deal of understanding can be gleaned from direct observation of organismal growth, development, and behavior. However, direct observation can be time consuming and influence the organism through unintentional stimuli. Additionally, video capturing equipment can often be prohibitively expensive, difficult to modify to one’s specific needs, and may come with unnecessary features. Here, we describe the PiSpy, a low-cost, automated video acquisition platform that uses a Raspberry Pi computer and camera to record video or images at specified time intervals or when externally triggered. All settings and controls, such as programmable light cycling, are accessible to users with no programming experience through an easy-to-use graphical user interface. Importantly, the entire PiSpy system can be assembled for less than $100 using laser-cut and 3D-printed components. We demonstrate the broad applications and flexibility of the PiSpy across a range of model and non-model organisms. Designs, instructions, and code can be accessed through an online repository, where a global community of PiSpy users can also contribute their own unique customizations and help grow the community of open-source research solutions.
Speaker
Gregory Pask and Benjamin Morris • Middlebury College
Scheduled for
Apr 19, 2022, 12:30 PM
Timezone
GMT-3
Forensic use of face recognition systems for investigation
With the increasing development of automatic systems and artificial intelligence, face recognition is becoming increasingly important in forensic and civil contexts. However, face recognition has yet to be thoroughly empirically studied to provide an adequate scientific and legal framework for investigative and court purposes. This observation sets the foundation for the research. We focus on issues related to face images and the use of automatic systems. Our objective is to validate a likelihood ratio computation methodology for interpreting comparison scores from automatic face recognition systems (score-based likelihood ratio, SLR). We collected three types of traces: portraits (ID), video surveillance footage recorded by ATM and by a wide-angle camera (CCTV). The performance of two automatic face recognition systems is compared: the commercial IDEMIA Morphoface (MFE) system and the open source FaceNet algorithm.
Speaker
Maëlig Jacquet • University of Lausanne
Scheduled for
Apr 10, 2022, 2:30 PM
Timezone
GMT+1
Probabilistic computation in natural vision
A central goal of vision science is to understand the principles underlying the perception and neural coding of the complex visual environment of our everyday experience. In the visual cortex, foundational work with artificial stimuli, and more recent work combining natural images and deep convolutional neural networks, have revealed much about the tuning of cortical neurons to specific image features. However, a major limitation of this existing work is its focus on single-neuron response strength to isolated images. First, during natural vision, the inputs to cortical neurons are not isolated but rather embedded in a rich spatial and temporal context. Second, the full structure of population activity—including the substantial trial-to-trial variability that is shared among neurons—determines encoded information and, ultimately, perception. In the first part of this talk, I will argue for a normative approach to study encoding of natural images in primary visual cortex (V1), which combines a detailed understanding of the sensory inputs with a theory of how those inputs should be represented. Specifically, we hypothesize that V1 response structure serves to approximate a probabilistic representation optimized to the statistics of natural visual inputs, and that contextual modulation is an integral aspect of achieving this goal. I will present a concrete computational framework that instantiates this hypothesis, and data recorded using multielectrode arrays in macaque V1 to test its predictions. In the second part, I will discuss how we are leveraging this framework to develop deep probabilistic algorithms for natural image and video segmentation.
Speaker
Ruben Coen-Cagli • Albert Einstein College of Medicine
Scheduled for
Mar 29, 2022, 11:00 AM
Timezone
EDT
Identity-Expression Ambiguity in 3D Morphable Face Models
3D Morphable Models are my favorite class of generative models and are commonly used to model faces. They are typically applied to ill-posed problems such as 3D reconstruction from 2D data. I'll start my presentation with an introduction into 3D Morphable Models and show what they are capable of doing. I'll then focus on our recent finding, the Identity-Expression Ambiguity: We demonstrate that non-orthogonality of the variation in identity and expression can cause identity-expression ambiguity in 3D Morphable Models, and that in practice expression and identity are far from orthogonal and can explain each other surprisingly well. Whilst previously reported ambiguities only arise in an inverse rendering setting, identity-expression ambiguity emerges in the 3D shape generation process itself. The goal of this presentation is to demonstrate the ambiguity and discuss its potential consequences in a computer vision setting as well as for understanding face perception mechanisms in the human brain.
Speaker
Bernhard Egger • Friedrich-Alexander-Universität Erlangen-Nürnberg
Scheduled for
Mar 16, 2022, 5:00 PM
Timezone
GMT+1