A joint cross-attention model for audio-visual fusion in dimensional emotion recognition

A joint cross-attention model for audio-visual fusion in dimensional emotion recognition

Praveen, R. G., de Melo, W. C., Ullah, N., Aslam, H., Zeeshan, O., Denorme, T., Pedersoli, M., Koerich, A. L., Bacon, S., Cardinal, P. and Granger, E..

In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (New Orleans, LA, USA, June 19-20, 2022)p. 2485-2494.Piscataway, NJ, USA : IEEE. 2022