[Seminar] ILLS seminar: Information Analysis and Methods for Representation Learning and Data-Driven Structural Detection

A talk of ILLS will be held on Thursday, November 24 at 12:00 in hybrid mode.

Title: Information Analysis and Methods for Representation Learning and Data-Driven Structural Detection
Jorge F. Silva, Professor a University of Chile, Santiago de Chile

Abstract:
Machine learning and Information Theory are two broad research areas with strong connections. In this presentation, we will cover two topics that explore de use of information-theoretic measures in learning. On the first topic, we will present results that show how adequate the adoption of mutual information is for predicting the operational quality of a transformation (or encoder) in classification. These results offer new insights into adopting information measures in machine learning, like mutual information and cross-entropy. For the second topic, we will discuss the idea of information sufficiency, representing a model’s latent structure, and explore a non-parametric data-driven method to detect this type of structure from data. We will focus on the learning-decision task of testing independence using a non-parametric mutual information estimator. We present non-asymptotic and asymptotic results that support the advantage of this approach and elaborate on applications for learning data structure and model-change detection.

References:
Silva and Tobar, “On the Interplay between Information Loss and Operation Loss in Representations for Classification,” in AISTATS2022.
Gonzales et al. “Data-Driven Representations for Testing Independence: Modeling, Analysis and Connection with Mutual Information Estimation,” IEEE Trans. on Signal Proc., 70, 2022.

Short-Bio:
Jorge F. Silva (Senior Member, IEEE) is an Associate Professor in the Electrical Engineering (EE) Department at Universidad de Chile and Principal Investigador with the Advanced Center of Electrical and Electronic Engineering in Valparaiso-Chile. Jorge F. Silva received an M.Sc. and Ph.D. in Electrical Engineering from the University of Southern California (USC), Los Angeles, CA, USA, 2005 and 2008, respectively. Jorge F. Silva was a Research Assistant with the Signal Analysis and Interpretation Laboratory (SAIL), USC, during 2003–2008 and was also a Research Intern with the Speech Research Group, Microsoft Corporation, Redmond, in 2005. He received the Outstanding Thesis Award 2009 for Theoretical Research of the Viterbi School of Engineering, the Viterbi Doctoral Fellowship 2007-2008, and Simon Ramo Scholarship 2007-2008 USC. He was an Associate Editor for the IEEE Transactions on Signal Processing from 2006 to 2008.

* In person: ETS-LIVIA, room A-3600.
* Zoom link: https://cnrs.zoom.us/j/96338640901?pwd=MkNlT0FFS1c1T2Z6c0dManpLc3l1dz09
* Meeting ID: 963 3864 0901

[Seminar] Joint Attention for Dimensional Emotion Recognition using Audio Visual Fusion

The next LIVIA seminar will be held on Wednesday, November 2 at 12h00 in hybrid mode.

Title: Joint Attention for Dimensional Emotion Recognition using Audio Visual Fusion
by Gnana Praveen Rajasekar, Ph.D. candidate at the LIVIA

Abstract:
Automatic emotion recognition (ER) has recently gained a lot of interest due to its potential in many real-world applications. In this context, multimodal approaches have been shown to improve performance (over unimodal approaches) by combining diverse and complementary sources of information, providing some robustness to noisy and missing modalities. We focus on dimensional ER based on the fusion of facial and vocal modalities extracted from videos, where complementary audio-visual (A-V) relationships are explored to predict an individual’s emotional states in valence-arousal space. Most state-of-the-art fusion techniques rely on recurrent networks or conventional attention mechanisms that do not effectively leverage the complementary nature of A-V modalities. To address this problem, we introduce a joint cross-attentional model for A-V fusion that extracts the salient features across A-V modalities, that allows to effectively leverage the inter-modal relationships, while retaining the intra-modal relationships. In particular, it computes the cross-attention weights based on correlation between the joint feature representation and that of the individual modalities. By deploying the joint A-V feature representation into the cross-attention module, it helps to simultaneously leverage both the intra and inter modal relationships, thereby significantly improving the performance of the system over the vanilla cross-attention module. The effectiveness of our proposed approach is validated experimentally on challenging videos from the RECOLA and AffWild2 datasets. Results indicate that our joint cross-attentional A-V fusion model provides a cost-effective solution that can outperform state-of-the-art approaches, even when the modalities are noisy or absent.

https://arxiv.org/pdf/2209.09068.pdf

* In person: ETS-LIVIA, room A-3600.
* Zoom link: https://etsmtl.zoom.us/j/84820130813

[Seminar] Semi-Weakly Supervised Object Detection by Sampling Pseudo Ground-Truth Boxes

The next LIVIA seminar will be held on Thursday, August 18 at 12h00 in hybrid mode.

Title: Semi-Weakly Supervised Object Detection by Sampling Pseudo Ground-Truth Boxes
by Akhil Meethal, Ph.D. candidate at the LIVIA

Abstract:
Semi- and weakly-supervised learning have recently attracted considerable attention in the object detection literature since they can alleviate the cost of annotation needed to successfully train deep learning models. State-of-art approaches for semi-supervised learning rely on student-teacher models trained using a multi-stage process, and considerable data augmentation. Custom networks have been developed for the weakly-supervised setting, making it difficult to adapt to different detectors. In this paper, a weakly semi-supervised training method is introduced that reduces these training challenges, yet achieves state-of-the-art performance by leveraging only a small fraction of fully-labeled images with information in weakly-labeled images. In particular, our generic sampling-based learning strategy produces pseudo-ground-truth (GT) bounding box annotations in an online fashion, eliminating the need for multi-stage training, and student-teacher network configurations. These pseudo GT boxes are sampled from weakly-labeled images based on the categorical score of object proposals accumulated via a score propagation process. Empirical results on the Pascal VOC dataset, indicate that the proposed approach improves performance by 5.0% when using VOC 2007 as fully-labeled, and VOC 2012 as weak-labeled data. Also, with 5-10% fully annotated images, we observed an improvement of more than 10% in mAP, showing that a modest investment in image-level annotation, can substantially improve detection performance.

https://arxiv.org/abs/2204.00147

* In person: ETS-LIVIA, room A-3600.

[Seminar] Structural Equation Modeling to latent causal representation learning for more trustable ML

The next LIVIA seminar will be held on Thursday, July 21st at 12h00 in hybrid mode.

Title: Structural Equation Modeling to latent causal representation learning for more trustable ML
by Prof. Myriam Tami, Paris-Saclay, CentraleSupélec

Abstract:
Structural equation models (SEM) with latent variables (LVs) are used to model relationships between observable and latent variables. We will present an approach for estimating an SEM model with LVs based on its global likelihood maximization by the EM algorithm. We will give the numerical results of this approach on simulated data and show, via an application on real environmental data, how to practically build a model and evaluate its quality. Finally, we apply the approach developed in the context of a clinical oncology trial to study the longitudinal quality of life data. We show that by effectively reducing the data dimension, the EM approach simplifies the longitudinal analysis of quality of life by avoiding multiple tests. Thus, it helps to facilitate the evaluation of the clinical benefit of a treatment.
Then, after introducing some key concepts from the causality field, we will motivate the interest in considering SEM models with LVs in this growing field of research. Indeed, identifying causal relationships between observed variables has drawn much attention in the fields of statistical learning and AI. This area, well known as Causal discovery, now mainly includes a range of approaches that do not consider the presence of LVs and that encounter limitation in handling a huge number of variables. We will see that SEM with LVs can be a response to these limitations and constitute an interesting line of research to explore together.

Bio:
Myriam TAMI (PhD 2016, University of Montpellier, Institut Montpelliérain Alexander Grothendieck, south of France) is an Associate Professor at University Paris-Saclay, CentraleSupélec, MICS lab. Her research works are on AI, Machine Learning, representation learning, causality, and models in the context of complex or heterogeneous data, e.g., multimodal, structured, and unstructured with sometimes latent variables, with uncertainty or weakly labeled. Her publications and research profile can be consulted on her web page or Google Scholar via the following links.
Web page: https://myriamtami.github.io/
Google scholar: https://scholar.google.com/citations?hl=fr&user=kavk5oUAAAAJ

[Seminar] Security in machine learning models and privacy-preserving data sharing

The next LIVIA seminar will be held on Thursday, June 23 at 12h00 in hybrid mode.

Title: Privacy-Preserving Data Sharing and Security in Machine Learning Models
by Prof. Mohammadhadi Shateri, Department of Systems Engineering

Abstract: These days, many people admire the great effects and the importance of AI in different applications including healthcare, social media, transport, and so forth. As the two main components of any AI approach one can name the “learning model” and the “data”. The focus of the recent studies has been mostly on boosting the efficiency of the AI approaches by improving the current models or developing more efficient learning algorithms and collecting data samples. Although important, the fact that both the learning model and the process of collecting/sharing datasets can leak sensitive information about the users, received less attentions in the literature. In this talk, the privacy issues regarding the (machine) learning models and data sharing are discussed in terms of the current attack/defense mechanisms. Some practical examples in applications such as smart meters would be presented and several challenges and the current focus of research will be discussed.
Bio: Mohammadhadi Shateri received the Ph.D. in electrical engineering from McGill University, Montreal, Canada in 2021. He continued his work with McGill as a postdoctoral researcher until he joined École de technologie supérieure in June 2022 as an assistant professor. His research interests include machine learning, security of (machine) learning models, and secure data sharing with applications in health and smart grids, among others. He won several scholarships for supporting his research including MEDA (McGill engineering doctoral award), MGS (Manitoba graduate scholarship education and advanced learning, Province of Manitoba), and UMGF (University of Manitoba graduate fellowship).

* In person: ETS-LIVIA, room A-3600. Please confirm your presence if you attend in person.

[Seminar] Local overlap reduction procedure for dynamic ensemble selection

The next LIVIA seminar will be held on Thursday, May 19 at 12h00 in hybrid mode.

Title: Local overlap reduction procedure for dynamic ensemble selection
by Mariana A. Souza, Ph.D. candidate at LIVIA

Résumé / Summary: (see paper in attachment)
Class imbalance is a characteristic known for making learning more challenging for classification models as they may end up biased towards the majority class. A promising approach among the ensemble-based methods in the context of imbalance learning is Dynamic Selection (DS). DS techniques single out a subset of the classifiers in the ensemble to label each given unknown sample according to their estimated competence in the area surrounding the query. Because only a small region is taken into account in the selection scheme, the global class disproportion may have less impact over the system’s performance. However, the presence of local class overlap may severely hinder the DS techniques’ performance over imbalanced distributions as it not only exacerbates the effects of the under-representation but also introduces ambiguous and possibly unreliable samples to the competence estimation process. Thus, in this work, we propose a DS technique which attempts to minimize the effects of the local class overlap during the classifier selection procedure. The proposed method iteratively removes from the target region the instance perceived as the hardest to classify until a classifier is deemed competent to label the query sample. The known samples are characterized using instance hardness measures that quantify the local class overlap. Experimental results show that the proposed technique can significantly outperform the baseline as well as several other DS techniques, suggesting its suitability for dealing with class under-representation and overlap. Furthermore, the proposed technique still yielded competitive results when using an under-sampled, less overlapped version of the labelled sets, specially over the problems with a high proportion of minority class samples in overlap areas. Code available at https://github.com/marianaasouza/lords.

* In person: ETS-LIVIA, room A-3600

[Seminar] Negative evidence for weakly supervised learning

The next LIVIA seminar will be held on Thursday, March 3rd at 12h00 by Zoom.

Title: Negative evidence for weakly supervised learning
by Soufiane Belhabi, postdoctoral fellow at LIVIA

Summary:
Class Activation Mapping (CAM) methods have recently gained much attention for weakly-supervised object localization (WSOL) tasks. They allow for CNN visualization and interpretation without training on fully annotated image datasets. CAM methods are typically integrated within off-the-shelf CNN backbones, such as ResNet50. Due to convolution and pooling operations, these backbones yield low resolution CAMs with a down-scaling factor of up to 32, contributing to inaccurate localizations. Interpolation is required to restore full size CAMs, yet it does not consider the statistical properties of objects, such as color and texture, leading to activations with inconsistent boundaries, and inaccurate localizations. As an alternative, we introduce a generic method for parametric upscaling of CAMs that allows constructing accurate full resolution CAMs (FCAMs). In particular, we propose a trainable decoding architecture that can be connected to any CNN classifier to produce highly accurate CAM localizations. Given an original low resolution CAM, foreground and background pixels are randomly sampled to fine-tune the decoder. Additional priors such as image statistics and size constraints are also considered to expand and refine object boundaries. Extensive experiments1, over three CNN backbones and six WSOL baselines on the CUB-200-2011 and OpenImages datasets, indicate that our F-CAM method yields a significant improvement in CAM localization accuracy. F-CAM performance is competitive with state-of-art WSOL methods, yet it requires fewer computations during inference. Additional experiments and ablations were conducted on histology datasets with a focus on negative evidence. Results showed the benefits of our method compared to state-of-the-art methods.

Papers:
https://arxiv.org/abs/2109.07069
https://arxiv.org/abs/2201.02445

[Seminar] Deep Generative Models for Molecule Optimization

The next LIVIA seminar will be held on Thursday, February 3rd at 12h00 by Zoom.

Title: Deep Generative Models for Molecule Optimization
by Dr. Xia Ning, Associate Professor in the Biomedical Informatics Department, and the Computer Science and Engineering Department at the Ohio State University

Summary:
Molecule optimization is a critical step in drug development to improve desired properties of drug candidates through chemical modification. In this talk, I will present a novel deep generative model Modof over molecular graphs for molecule optimization. We developed Modof leveraging the most advanced deep learning approaches that enable profound molecule structure representation learning and new molecule generation through sampling from molecule representations and encoding. Following the rationale of fragment-based drug design, Modof modifies a given molecule by predicting a single site of disconnection at the molecule and the removal and/or addition of fragments at that site. A pipeline of multiple, identical Modof models is implemented into Modof-pipe to optimize molecules at multiple disconnection sites. Here we show that Modof-pipe can retain major molecular scaffolds, allow controls over intermediate optimization steps, and better constrain molecule similarities. Modof-pipe outperforms the state-of-the-art methods on benchmark datasets, with a 121.0% property improvement without molecular similarity constraints, and 82.0% and 10.6% improvement if the optimized molecules are at least 0.2 and 0.4 similar to those before optimization, respectively. I will also briefly present our other work on drug candidate prioritization and drug selection using machine learning.

[Seminar] Few-Shot Object Detection in Aerial Images

The next LIVIA seminar will be held on Wednesday, September 28 at 12h00 in hybrid mode.

Title: Few-Shot Object Detection in Aerial Images
by Pierre Le Jeune, Ph.D. candidate at the L2TI, University Sorbonne, Paris

Abstract: Object Detection is a challenging task in computer vision. Recently, deep learning-based methods overcame classical algorithms both in terms of quality and speed. However, deep learning requires large annotated training sets to achieve such performance. Few-Shot Learning (FSL) aims to overcome this shortcoming by learning more efficiently on scarce data. While FSL was vastly explored in the literature, Few-Shot Object Detection (FSOD) only became a topic of interest very recently. Most authors develop and benchmark their methods on natural images and nothing guarantees the transfer of their performance on other kinds of images. This work focuses on applying FSOD to aerial images. First, we review the definition of FSOD and several existing methods to address this task. A performance analysis is done on aerial and natural images to understand the challenges of using such methods on aerial images. In light of this analysis, we propose a novel attention mechanism. It specifically targets small objects which appear to be extremely difficult to detect in the few-shot regime. Finally, we question the relevance of Intersection over Union (IoU) as a criterion for box similarity and suggest a scale-dependent version: Scaled-IoU which agrees better with human perception.

Bio: Pierre LE JEUNE is a PhD student at L2TI laboratory, University Sorbonne Paris Nord while working at COSE company. He received the M.Sc. degree in Mathematical Modelling and Computation from Danish Technical University (Copenhagen) and the M.Sc. in engineering from Centrale Nantes. His current research interests include Few-Shot Learning, Computer Vision and Deep Learning

* In person: ETS-LIVIA, room A-3600.

[Seminar] Representation Learning for Vision and Language

Samira’s work spans across several areas in deep learning research including multi-modal learning, knowledge distillation, deep reinforcement learning, and applications. She made significant contributions to the field of human computer interaction with her work on multi-modal learning for emotion recognition in videos. She also worked on visual reasoning at the intersection of vision and text. She contributed to the creation of several large-scale benchmarks including FigureQA (visual reasoning on mathematical plots), Something-Something (fine-grained video captioning) and ReDial (conversational movie recommendation). On the application side she works on machine learning for disaster response with focus on modeling of extreme weather events.