Detection and localization of audio event for home surveillance using CRNN

V.S. Suruthhi, V. Smita, J. Rolant Gini, K. I. Ramachandran

Abstract


Safety and security have been a prime priority in people’s lives, and having a surveillance system at home keeps people and their property more secured. In this paper, an audio surveillance system has been proposed that does both the detection and localization of the audio or sound events. The combined task of detecting and localizing the audio events is known as Sound Event Localization and detection (SELD). The SELD in this work is executed through Convolutional Recurrent Neural Network (CRNN) architecture. CRNN is a stacked layer of convolutional neural network (CNN), recurrent neural network (RNN) and fully connected neural network (FNN). The CRNN takes multichannel audio as input, extracts features and does the detection and localization of the input audio events in parallel. The SELD results obtained by CRNN with the gated recurrent unit (GRU) and with long short-term memory (LSTM) unit are compared and discussed in this paper. The SELD results of CRNN with LSTM unit gives 75% F1 score and 82.8% frame recall for one overlapping sound. Therefore, the proposed audio surveillance system that uses LSTM unit produces better detection and overall performance for one overlapping sound.

Full Text:

PDF

References


UNODC: United Nations Office on Drugs and Crimes, "Burglary | Statistics and data, "2017. [Online]. Available:https://dataunodc .un.org/crime/burglary.

K. Lashmi and A. S. Pillai, "Ambient Intelligence and IoT Based Decision Support System for Intruder Detection," 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), Coimbatore, India, 2019, pp. 1-4. DO I:10.1109/ICECCT.2019.8869327

Dr. Prakash P, Suresh, R., and Kumar, P. N. Dhinesh, “Smart City Video Surveillance using Fog Computing”, in International Journal of Enterprise Network Management, vol. 10, no. 3/4, pp.389 – 399, 2019. DOI: 10.1504/IJENM.2019.103165

Caught on camera, "Different Types of CCTV-CCTV Camera Types and Uses, "2020. [Online]. Available: https://www.caughtoncamera.net/ne ws/ different-types-of-cctv/.

S.Ntalampiras, "Audio Surveillance, "2012. [pdf]. Available: https://www. itpress.com/Secure/elibrary/papers/9781845645625/978184 5645625012FU1.pdf

P. Foggia, N. Petkov, A. Saggese, N. Strisciuglio and M. Vento, "Audio Surveillance of Roads: A System for Detecting Anomalous Sounds," in IEEE Transactions on Intelligent Transportation Systems, vol. 17, no. 1, pp. 279-288, Jan. 2016. DOI: 10.1109/TITS.2015.2470216

S. Ntalampiras, I. Potamitis and N. Fakotakis, "Probabilistic Novelty Detection for Acoustic Surveillance Under Real-World Conditions," in IEEE Transactions on Multimedia, vol. 13, no. 4, pp. 713-719, Aug. 2011. DOI: 10.1109/TMM.2011.2122247

A. Mesaros et al., "Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 2, pp. 379-393, Feb. 2018. DOI: 10.1109/TASLP.2017.2778423

E. Çakır, G. Parascandolo, T. Heittola, H. Huttunen and T. Virtanen, "Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 6, pp. 1291-1303, June 2017. DOI:10.1109/T ASLP.2017.2690575

S. Adavanne, P. Pertilä and T. Virtanen, "Sound event detection using spatial features and convolutional recurrent neural network," 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, 2017, pp. 771-775. DOI:10.1109/ICA SSP.2017.7952260

P. Zinemanas, P. Cancela and M. Rocamora, "End-to-end Convolutional Neural Networks for Sound Event Detection in Urban Environments," 2019 24th Conference of Open Innovations Association (FRUCT), Moscow, Russia, 2019, pp. 533-539. DOI:10.23919/FRUCT.2019.871 1906

G. Parascandolo, H. Huttunen and T. Virtanen, "Recurrent neural networks for polyphonic sound event detection in real-life recordings," 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, 2016, pp. 6440-6444. DOI:10.11 09/ICASSP.2016.7472917

L. Birnie, T. D. Abhayapala, H. Chen and P. N. Samarasinghe, "Sound Source Localization in a Reverberant Room Using Harmonic Based Music," ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 651-655. DOI: 10.1109/ICASSP.2019.8683098

L. O. Nunes et al., "A Steered-Response Power Algorithm Employing Hierarchical Search for Acoustic Source Localization Using Microphone Arrays," in IEEE Transactions on Signal Processing, vol. 62, no. 19, pp. 5171-5183, Oct.1, 2014. DOI: 10.1109/TSP.2014.2336636

M. W. Hansen, J. R. Jensen and M. G. Christensen, "Pitch and TDOA-based localization of acoustic sources with distributed arrays," 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, QLD, 2015, pp. 2664-2668. DOI: 10.1109/ICASSP.2015.7178454

J. Pak and J. W. Shin, "Sound Localization Based on Phase Difference Enhancement Using Deep Neural Networks," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 8, pp. 1335-1345, Aug. 2019. DOI: 10.1109/TASLP.2019.2919378

S. Adavanne, A. Politis and T. Virtanen, "Direction of Arrival Estimation for Multiple Sound Sources Using Convolutional Recurrent Neural Network," 2018 26th European Signal Processing Conference (EUSIPCO), Rome, 2018, pp. 1462-1466. DOI: 10.23919/EUSIP CO.2018.8553182

S. Adavanne, A. Politis, J. Nikunen and T. Virtanen, "Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks," in IEEE Journal of Selected Topics in Signal Processing, vol. 13, no. 1, pp. 34-48, March 2019. DOI:10.110 9/JSTSP.2018.2885636

T. Butko, F. G. Pla, C. Segura, C. Nadeu and J. Hernando, "Two-source acoustic event detection and localization: Online implementation in a Smart-room," 2011 19th European Signal Processing Conference, Barcelona, 2011, pp.1317-1321.

A. Mesaros, T. Heittola, and T. Virtanen, “Metrics for polyphonic sound event detection,” Applied Sciences, vol. 6, no. 6, pp. 162–178, 2016. DOI: 10.3390/app6060162


Refbacks

  • There are currently no refbacks.


International Journal of Electronics and Telecommunications
is a periodical of Electronics and Telecommunications Committee
of Polish Academy of Sciences

eISSN: 2300-1933