Regular Members

Regular Members Associate Members Current Students Alumni

Patrick Cardinal

Director and Professor
Department of Software Engineering and IT

Office: A-4486
Telephone: 514 396-8573
Fax: 514 396-8405
patrick.cardinal@etsmtl.ca

Home

Patrick Cardinal has been a professor in the software engineering and IT department at ÉTS since January 2015. Since graduating from ÉTS in 2000, Patrick has always worked in the field of automatic speech recognition. Prior to joining the department, he completed a 15-month post-doctoral internship at the Massachusetts Institute of Technology (MIT) under the supervision of James Glass. Previously, he held various research positions at the Montreal Computer Research Center (CRIM). During his 13 years at CRIM, he completed a Master’s degree in Computer Science at McGill University (2003) and a PhD in Engineering at ÉTS.

+ Research

Machine learning algorithms
Speech Recognition
Language identification
Sensing emotions

+ Education

LOG-320: Data Structures and Algorithms
MTI-815: Voice Communication Systems

+ University education

2013

Ph.D, Software engineering in School of Higher Technology (ÉTS), Canada

2003

Mastery, Computer Science in McGill, Canada

2000

Baccalaureate, Electrical Engineering in School of Higher Technology (ÉTS), Canada

+ Experiences in teaching, research or industry

2015/01 to Current

Associate Professor in School of Higher Technology (ÉTS)

2010/01 to Current

Manager Review in Ordre des ingénieurs du Québec

2015/01 to 2016/01

Research Affiliate in Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology (MIT)

2013/09 to 2014/11

Postdoctoral Associate in Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology (MIT)

2006/05 to 2013/08

Lecturer in Software Engineering and Information Technology, School of Higher Technology (ÉTS)

2012/06 to 2013/08

Research Advisor and Deputy Director of the team in Speech Recognition, Montreal Computer Science Center

2007/06 to 2012/06

Research Advisor in Speech Recognition, Montreal Computer Science Center

2005/06 to 2007/06

Senior Research Officer in Speech Recognition, Montreal Computer Science Center

2004/01 to 2006/05

Laboratory Manager in Software Engineering and Information Technology, School of Higher Technology (ÉTS)

2000/01 to 2005/06

Research Officer in Speech Recognition, Montreal Computer Science Center

1998/05 to 2002/08

Laboratory Manager in Electrical Engineering, School of Higher Technology (ÉTS)

Publications

+ Original articles in refereed journals and books chapters

«From environmental sound representation to robustness of 2D CNN models against adversarial attacks»

Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich”
Applied Acoustics 2022.

«Multidiscriminator sobolev defense-GAN against adversarial attacks for end-to-end speech systems»

Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich”
IEEE Transactions on Information Forensics and Security 2022.

«Bi-discriminator GAN for tabular data synthesis»

Mohammad Esmaeilpour, Nourhene Chaalia, Adel Abusitta, Franois-Xavier Devailly, Wissem Maazoun, Patrick Cardinal”
Pattern Recognition Letters 2022.

«Detection and classification of human-produced nonverbal audio events»

Philippe Chabot, Rachel E.Bouserhal, Patrick Cardinal, Jérémie Voix”
Applied Acoustics 2021.

«Cyclic defense GAN against speech adversarial attacks»

Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich”
IEEE Signal Processing Letters 2021.

«Towards robust speech-to-text adversarial attack»

Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich”
47th IEEE International Conference on Acoustics, Speech, and Signal Processing (Singapore, Singapore, May 23-27, 2022)p. 2869-2873.Institute of Electrical and Electronics Engineers Inc.. 2022.

«Class-conditional defense GaN against end-to-end speech attacks»

Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich”
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (Toronto, ON, Canada – En ligne, June 06-11,, 2021)p. 2565-2569.Institute of Electrical and Electronics Engineers Inc.. 2021.

«Cross attentional audio-visual fusion for dimensional emotion recognition»

R.Gnana Praveen, Eric Granger, Patrick Cardinal”
16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021) (Jodhpur, India, Dec. 15-18, 2021)Institute of Electrical and Electronics Engineers Inc.. 2021.

«RADARSAT-2 Synthetic-Aperture radar land cover segmentation using deep convolutional neural networks»

Mirmohammad Saadati, Marco Pedersoli, Patrick Cardinal, Peter Oliver”
Pattern Recognition. ICPR International Workshops and Challenges, Virtual Event, January 10-15, 2021, Proceedings Part VIII (Milan, Italy, Jan. 10-15, 2021) p. 106-117.Springer. 2021.

«Deep weakly supervised domain adaptation for pain localization in videos»

Gnana R.Praveen, Eric Granger, Patrick Cardinal”
15th IEEE International Conference on Automatic Face and Gesture Recognition (FG) (Buenos Aires, Argentina, Nov. 16-20, 2020)p. 473-480.IEE Computer Society 2020.

+ Undergraduate teaching

LOG-320: Data Structures and Algorithms

This course allows you to acquire a specific knowledge of software engineering and data structures and algorithms. Understand and use asymptotic analysis to judiciously choose the appropriate data structures and the optimal algorithm type to effectively solve a problem while respecting constraints and available resources.

At the end of this course, the student will be able to choose from a multitude of basic data structures (table, file, stack or list) or more advanced (tree structures, graphs, hash tables) to solve different more or less complex problems. He will also be able to combine and adapt them to deal with different situations.

The student will also be able to choose the type of algorithms and analyze its overall performance for different basic problems that involve, for example, searching in graphs, combinatorial optimization or string search.

+ Second cycle education

MTI-815: Voice Communication Systems

Following this course, the student will be able to:
• explain the operation of voice communication systems;
• choose a voice communication system as needed;
• Evaluate voice communication systems.

Voice communication by computer. Modes of production and perception of speech. How computers work to compress, encode, synthesize and recognize the speech signal. Encoding techniques (PCM, ADPCM, LPC, ACELP), voice synthesis (Klatt, LPC, PSOLA) and speech, speaker and emotion recognition (HMM, DMM, GMM).

+ Other courses taught as a lecturer

GTI-770: Intelligent Systems and Machine Learning
GTI-410: Application of digital technologies in graphics and imagery
LOG-710: Operating Systems and System Programming

Research

+ Available Projects

Most of the available projects involve artificial intelligence algorithms and allow students to acquire the following skills:

Understanding and use of machine learning algorithms such as neural networks, SVM, random forest, etc.
Use of advanced techniques such as overall methods or transfer learning
Extraction of information from different signals (audio, video, ECG, etc.)
And even more…

Integrated speech processing system for human-robot rehab interaction

Summary

This project, which is in collaboration with the Université de Sherbrooke and the Université de Montréal, consists in creating a robot able to intervene with patients with different diseases. At the moment, we are focusing on patients with dysarthria from degenerative diseases (Friedrich ataxia, for example). Dysarthria is a motor disorder that leads to pronunciation difficulties that become more important as the disease progresses. In the short term, it is about developing applications to help patients break social isolation by enabling them to communicate better with others. For example, a communication aid could take over when the patient has difficulty speaking. The system should be able to automatically determine whether help is needed or not. Smart applications for exercise aid are also being considered.

Postdoc
Mohammed Senousaoui (Speaker identification)

Detection of emotions and / or level of depression

Summary

This project aims to create an application allowing the weekly monitoring, by a therapist, of a depressive patient. The tool will allow the therapist to have a better idea of the emotional state of his patient between the different appointments, but especially, in real situations, but random so that the patient’s behavior is not affected by the to know how to evaluate.

The evaluation of the patient will be based on audiovisual information. The application on a mobile device will be able to capture audio and / or video segments from which the level of depression will be determined.

Master student
Rafooneh Jafarian Bahri

Detection of a person’s stress level

Summary

The primary goal of this project is to determine if heart signals can be a good predictor of stress levels while avoiding confusion with a change in rhythm caused by physical activity. For this project, we have developed a database with several modalities (audio / video / ECG) with three types of annotations:

Stress felt by the subject;
Stress perceived by two experts;
Level of cortisol

From this database, several studies involving machine learning techniques are possible such as detecting a person’s stress from the audio or image.

This project is in collaboration with Pierrich Pluquellec from the University of Montreal.

PhD Students
Patrice Boucher

Detection of vocal stereotypy in autistic children

Summary: In order to comfort oneself, some children with autism will emit certain sounds in a repetitive way. This behavior generally makes people around the child feel uncomfortable, which greatly affects their integration and development. Several types of therapies can be used by psychoeducators to reduce the level of vocal stereotypy. The problem is that evaluating a therapy involves recording and analyzing two videos (before and after treatment) to determine if the therapy has been effective. The goal of this project is to create software that will be able to analyze audio recordings to evaluate the effectiveness of a therapy. This software will allow significant efficiency gains for therapists,

Dialect Detection (Arabic)

Summary

This is to be able to identify the dialect that a person uses from an audio signal. The purpose of this research is to be able to create a detection phase in order to use the correct voice recognition system in order to automatically transcribe the audio content of a record in Arabic.

+ Old Projects

Free Text Detection in an Abstract Command

This project, in partnership with Nuances Communications, aimed to test whether the use of prosodic information could improve the level of free text detection in an order. Free text is a piece of text in a command that should not be parsed by the parser. For example, if an order is “Write to Mary: I’m going to be 5 minutes late,” the system does not need to parse the free text section (“I’m going to be 5 minutes late”) since is useless for determining the action of sending a message.

Results

The correlation between prosodic information and the presence of a free text is strong, but its use did not bring about a significant improvement in performance, but the available data were not sufficiently representative to obtain a definitive conclusion. Other experiments will be carried out by Nuances Communications

Master’s student
Simon Boutin (graduated in May 2016)

Implementation of the PSOLA algorithm

Summary

This project consisted of creating software to speed up or slow down an audio recording without changing the tone. This makes it possible to increase or reduce the length of a recording without being perceptible to the human ear.

Master student by project
Freud Romero (graduated in January 2016)

+ Previous Projects (at CRIM)

Project C³GRID video_c3grid

Abstract

The C³GRID project was aimed at developing a computing grid for distributed learning of acoustic, visual and speech recognition models.

Result

The team contributed to the module of extraction of visual characteristics on the shape of the mouth, in order to increase the robustness of the recognition in noisy sound environment.

RAP Video RAP Project

Summary: The Automatic Speech Recognition (APR) project focused on the automatic transcription of House of Commons debates and testimony to committees to enable people who are deaf or hard of hearing to access information. creating universal multimodal access to live debates in the Canadian Parliament.

MADIS project MADIS video

Summary: The MADIS project aimed to develop a benchmark for indexing and searching film content as part of the MPEG-7 standard for the National Film Board of Canada (NFB) .

STDIRECT Video TVA Project

Summary

The STDirect system allows the subtitling of live broadcasts at low cost. It adapts automatically to the news and can easily be adapted to another language.

After the integration of STDirect into the TVA Group, a new company was created, SOVO technologies , which uses STDirect to provide the subtitling service of:

All sports live on RDS, RDS2 and RDS Info Sports
Several programs for Télé-Québec, TVA, Canal Vie, CBC, RDI, TFO, CPAC
Special events such as the 2010 FIFA World Cup, live coverage of the Bastarache Commission, or the Vancouver 2010 Olympic Winter Games.

RyshcoMedia Project

Abstract

This project involved the development and development of voice alignment technology and the integration of voice alignment technology into a post-synchronization and dubbing support system for Ryshco Media (now DubSynchro). This firm specializes in dubbing for film and television.

Project E-Inclusion: official website

Summary: The objective of the E-Inclusion Network is to operate a network of users, artists, producers and researchers to develop audiovisual content processing tools and methods of content creation, to enable creators and producers of multimedia content to enhance the richness of the multimedia experience of people with sensory disabilities by automating aspects of the multimedia production and post-production process. This project is funded in part by Canadian Heritage.

Patrick

Cardinal

Regular Members

+ Research

+ Education

About Us

Research & Innovation

News & Events

Contact Us