Patrick Cardinal
Department of Software Engineering and IT
Patrick Cardinal has been a professor in the software engineering and IT department at ÉTS since January 2015. Since graduating from ÉTS in 2000, Patrick has always worked in the field of automatic speech recognition. Prior to joining the department, he completed a 15-month post-doctoral internship at the Massachusetts Institute of Technology (MIT) under the supervision of James Glass. Previously, he held various research positions at the Montreal Computer Research Center (CRIM). During his 13 years at CRIM, he completed a Master’s degree in Computer Science at McGill University (2003) and a PhD in Engineering at ÉTS.
+ Research
- Machine learning algorithms
- Speech Recognition
- Language identification
- Sensing emotions
+ Education
- LOG-320: Data Structures and Algorithms
- MTI-815: Voice Communication Systems
+ University education
2013
Ph.D, Software engineering in School of Higher Technology (ÉTS), Canada
2003
Mastery, Computer Science in McGill, Canada
2000
Baccalaureate, Electrical Engineering in School of Higher Technology (ÉTS), Canada
+ Experiences in teaching, research or industry
2015/01 to Current
Associate Professor in School of Higher Technology (ÉTS)
2010/01 to Current
Manager Review in Ordre des ingénieurs du Québec
2015/01 to 2016/01
Research Affiliate in Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology (MIT)
2013/09 to 2014/11
Postdoctoral Associate in Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology (MIT)
2006/05 to 2013/08
Lecturer in Software Engineering and Information Technology, School of Higher Technology (ÉTS)
2012/06 to 2013/08
Research Advisor and Deputy Director of the team in Speech Recognition, Montreal Computer Science Center
2007/06 to 2012/06
Research Advisor in Speech Recognition, Montreal Computer Science Center
2005/06 to 2007/06
Senior Research Officer in Speech Recognition, Montreal Computer Science Center
2004/01 to 2006/05
Laboratory Manager in Software Engineering and Information Technology, School of Higher Technology (ÉTS)
2000/01 to 2005/06
Research Officer in Speech Recognition, Montreal Computer Science Center
1998/05 to 2002/08
Laboratory Manager in Electrical Engineering, School of Higher Technology (ÉTS)
+ Original articles in refereed journals and books chapters
«From environmental sound representation to robustness of 2D CNN models against adversarial attacks»Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich” |
«Multidiscriminator sobolev defense-GAN against adversarial attacks for end-to-end speech systems»Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich” |
«Bi-discriminator GAN for tabular data synthesis»Mohammad Esmaeilpour, Nourhene Chaalia, Adel Abusitta, Franois-Xavier Devailly, Wissem Maazoun, Patrick Cardinal” |
«Detection and classification of human-produced nonverbal audio events»Philippe Chabot, Rachel E.Bouserhal, Patrick Cardinal, Jérémie Voix” |
«Cyclic defense GAN against speech adversarial attacks»Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich” |
Read More >
+ Papers in refereed conference proceedings
«Towards robust speech-to-text adversarial attack»Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich” |
«Class-conditional defense GaN against end-to-end speech attacks»Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich” |
«Cross attentional audio-visual fusion for dimensional emotion recognition»R.Gnana Praveen, Eric Granger, Patrick Cardinal” |
«RADARSAT-2 Synthetic-Aperture radar land cover segmentation using deep convolutional neural networks»Mirmohammad Saadati, Marco Pedersoli, Patrick Cardinal, Peter Oliver” |
«Deep weakly supervised domain adaptation for pain localization in videos»Gnana R.Praveen, Eric Granger, Patrick Cardinal” |
+ Undergraduate teaching
LOG-320: Data Structures and Algorithms
This course allows you to acquire a specific knowledge of software engineering and data structures and algorithms. Understand and use asymptotic analysis to judiciously choose the appropriate data structures and the optimal algorithm type to effectively solve a problem while respecting constraints and available resources.
At the end of this course, the student will be able to choose from a multitude of basic data structures (table, file, stack or list) or more advanced (tree structures, graphs, hash tables) to solve different more or less complex problems. He will also be able to combine and adapt them to deal with different situations.
The student will also be able to choose the type of algorithms and analyze its overall performance for different basic problems that involve, for example, searching in graphs, combinatorial optimization or string search.
+ Second cycle education
MTI-815: Voice Communication Systems
Following this course, the student will be able to:
• explain the operation of voice communication systems;
• choose a voice communication system as needed;
• Evaluate voice communication systems.
Voice communication by computer. Modes of production and perception of speech. How computers work to compress, encode, synthesize and recognize the speech signal. Encoding techniques (PCM, ADPCM, LPC, ACELP), voice synthesis (Klatt, LPC, PSOLA) and speech, speaker and emotion recognition (HMM, DMM, GMM).
+ Other courses taught as a lecturer
- GTI-770: Intelligent Systems and Machine Learning
- GTI-410: Application of digital technologies in graphics and imagery
- LOG-710: Operating Systems and System Programming
+ Available Projects
Most of the available projects involve artificial intelligence algorithms and allow students to acquire the following skills:
- Understanding and use of machine learning algorithms such as neural networks, SVM, random forest, etc.
- Use of advanced techniques such as overall methods or transfer learning
- Extraction of information from different signals (audio, video, ECG, etc.)
- And even more…
Integrated speech processing system for human-robot rehab interaction
Summary
This project, which is in collaboration with the Université de Sherbrooke and the Université de Montréal, consists in creating a robot able to intervene with patients with different diseases. At the moment, we are focusing on patients with dysarthria from degenerative diseases (Friedrich ataxia, for example). Dysarthria is a motor disorder that leads to pronunciation difficulties that become more important as the disease progresses. In the short term, it is about developing applications to help patients break social isolation by enabling them to communicate better with others. For example, a communication aid could take over when the patient has difficulty speaking. The system should be able to automatically determine whether help is needed or not. Smart applications for exercise aid are also being considered.
Postdoc
Mohammed Senousaoui (Speaker identification)
Detection of emotions and / or level of depression
Summary
This project aims to create an application allowing the weekly monitoring, by a therapist, of a depressive patient. The tool will allow the therapist to have a better idea of the emotional state of his patient between the different appointments, but especially, in real situations, but random so that the patient’s behavior is not affected by the to know how to evaluate.
The evaluation of the patient will be based on audiovisual information. The application on a mobile device will be able to capture audio and / or video segments from which the level of depression will be determined.
Master student
Rafooneh Jafarian Bahri
Detection of a person’s stress level
Summary
The primary goal of this project is to determine if heart signals can be a good predictor of stress levels while avoiding confusion with a change in rhythm caused by physical activity. For this project, we have developed a database with several modalities (audio / video / ECG) with three types of annotations:
- Stress felt by the subject;
- Stress perceived by two experts;
- Level of cortisol
From this database, several studies involving machine learning techniques are possible such as detecting a person’s stress from the audio or image.
This project is in collaboration with Pierrich Pluquellec from the University of Montreal.
PhD Students
Patrice Boucher
Detection of vocal stereotypy in autistic children
Summary: In order to comfort oneself, some children with autism will emit certain sounds in a repetitive way. This behavior generally makes people around the child feel uncomfortable, which greatly affects their integration and development. Several types of therapies can be used by psychoeducators to reduce the level of vocal stereotypy. The problem is that evaluating a therapy involves recording and analyzing two videos (before and after treatment) to determine if the therapy has been effective. The goal of this project is to create software that will be able to analyze audio recordings to evaluate the effectiveness of a therapy. This software will allow significant efficiency gains for therapists,
Dialect Detection (Arabic)
Summary
This is to be able to identify the dialect that a person uses from an audio signal. The purpose of this research is to be able to create a detection phase in order to use the correct voice recognition system in order to automatically transcribe the audio content of a record in Arabic.
+ Old Projects
Free Text Detection in an Abstract Command
This project, in partnership with Nuances Communications, aimed to test whether the use of prosodic information could improve the level of free text detection in an order. Free text is a piece of text in a command that should not be parsed by the parser. For example, if an order is “Write to Mary: I’m going to be 5 minutes late,” the system does not need to parse the free text section (“I’m going to be 5 minutes late”) since is useless for determining the action of sending a message.
Results
The correlation between prosodic information and the presence of a free text is strong, but its use did not bring about a significant improvement in performance, but the available data were not sufficiently representative to obtain a definitive conclusion. Other experiments will be carried out by Nuances Communications
Master’s student
Simon Boutin (graduated in May 2016)
Implementation of the PSOLA algorithm
Summary
This project consisted of creating software to speed up or slow down an audio recording without changing the tone. This makes it possible to increase or reduce the length of a recording without being perceptible to the human ear.
Master student by project
Freud Romero (graduated in January 2016)
+ Previous Projects (at CRIM)
Project C³GRID video_c3grid
Abstract
The C³GRID project was aimed at developing a computing grid for distributed learning of acoustic, visual and speech recognition models.
Result
The team contributed to the module of extraction of visual characteristics on the shape of the mouth, in order to increase the robustness of the recognition in noisy sound environment.
RAP Video RAP Project
Summary: The Automatic Speech Recognition (APR) project focused on the automatic transcription of House of Commons debates and testimony to committees to enable people who are deaf or hard of hearing to access information. creating universal multimodal access to live debates in the Canadian Parliament.
MADIS project MADIS video
Summary: The MADIS project aimed to develop a benchmark for indexing and searching film content as part of the MPEG-7 standard for the National Film Board of Canada (NFB) .
STDIRECT Video TVA Project
Summary
The STDirect system allows the subtitling of live broadcasts at low cost. It adapts automatically to the news and can easily be adapted to another language.
After the integration of STDirect into the TVA Group, a new company was created, SOVO technologies , which uses STDirect to provide the subtitling service of:
- All sports live on RDS, RDS2 and RDS Info Sports
- Several programs for Télé-Québec, TVA, Canal Vie, CBC, RDI, TFO, CPAC
- Special events such as the 2010 FIFA World Cup, live coverage of the Bastarache Commission, or the Vancouver 2010 Olympic Winter Games.
RyshcoMedia Project
Abstract
This project involved the development and development of voice alignment technology and the integration of voice alignment technology into a post-synchronization and dubbing support system for Ryshco Media (now DubSynchro). This firm specializes in dubbing for film and television.
Project E-Inclusion: official website
Summary: The objective of the E-Inclusion Network is to operate a network of users, artists, producers and researchers to develop audiovisual content processing tools and methods of content creation, to enable creators and producers of multimedia content to enhance the richness of the multimedia experience of people with sensory disabilities by automating aspects of the multimedia production and post-production process. This project is funded in part by Canadian Heritage.
Patrick
Cardinal
Research & Innovation
Contact Us
Pavillon Principal (A)
1100, rue Notre-Dame Ouest
Montréal, QC, H3C 1K3
Room A-3600
Tel.: +1 (514) 396-8650
E-Mail: eric.granger@etsmtl.ca