Thesis and Publications

Deep Learning-Based Background Noise Classification and Reduction for Audio Enhancement

Rakibul Islam, Rashik Rahman

2023

DOI: #

Background noise can deteriorate audio quality and interfere with the ability to concentrate on the intended sounds. The categorization and reduction of background noise are crucial in audio enhancement since they help to identify and diminish undesired noise. Implementing this technique improves audio recording intelligibility and quality, ensuring that the intended sound is dominant and free of background noise. The objective is to improve the user's auditory experience by minimizing ambient or unnecessary sounds in audio applications. In order to accurately categorize audio due to fluctuations in sound pitch and volume, a Deep Learning-Based Background Noise Classification and Reduction for Audio Enhancement is proposed in this research. By employing deep learning techniques, particularly Long Short-Term Memory (LSTM) networks, the methodology achieves an impressive accuracy rate of 88.35%. To train the proposed system, the Urbansound8 dataset is used, which has approximately 8,732 audio files in WAV format. The enhancement of background noise identification is achieved by employing an innovative sliding window technique that incorporates both audio wavelet characteristics and time information. In addition, a comparative analysis of Tiny Machine Learning (TinyML) models is used to construct a robust, efficient and effective background noise categorization and reduction model. Further research will develop an LSTM model using TinyML as a foundation to enhance and rectify noise in practical situations.

Deep Learning RNN Audio LSTM

PDF DOI

Real-time Object Detection Using Lightweight Neural Networks

2023

DOI: #

We introduce a lightweight neural network architecture designed for real-time object detection on edge devices. Our approach achieves state-of-the-art performance while reducing computational complexity by 60% compared to existing methods. The proposed architecture incorporates novel attention mechanisms and depth-wise separable convolutions to optimize both speed and accuracy. Extensive experiments on COCO and PASCAL VOC datasets demonstrate that our method outperforms previous lightweight models while running at over 30 FPS on mobile devices.