Comparison Of Time-Frequency Representations for Environmental Sound Classification Using Convolutional Neural Networks

My essay is on the paper presented by the author Muhammad Huzaifah on the topic Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks. This paper has introduced the importance of environmental sound classification which can be performed using the convolutional neural networks (CNNs).

Given the fact that CNNs are more suited for image and video applications, recent advancement in transforming 1-D audio to 2-D spectrogram image has aided the implementation of CNNs for audio processing purposes. The author has used various signal processing methods such as short-time Fourier Transform (STFT) with linear and Mel scales for audio, Constant-Q transform (CQT) and continuous wavelet transform (CWT) in order to observe their impact on the sound classification performance of environmental sound datasets.

The paper supports the hypothesis that vital features for sound classification is highly dependent on time-frequency representations. Moreover, the sliding window length of STFT depends on the characteristics of audio signal and 2-D convolution performs better when compared to 1-D convolution.

Since the author has used CNNs for classification, the conventional choices such as Mel-frequency cepstral coefficients (MFCCs) or Perceptual Linear Prediction (PLP) coefficients that were previously defined as basic building blocks for Gaussian mixture model (GMM)-based Hidden Markov Models (HMMs) are redundant. The reason being feature maps used in deep learning algorithms are independent of MFCCs and PLP features in order to be un-correlated.

This paper is built on the previous studies where the performance of short-time Fourier transform (STFT), fast wavelet transform (FWT) and continuous wavelet transform (CWT) were compared to the conventional machine learning techniques mentioned before and dives deep into the specifics of a CNN model.

The author has used STFT on both linear and Mel scales, CQT and CWT to assess the impact of different approaches in comparison to the baseline MFCC features on two publicly available environmental sound datasets (ESC- 50, UrbanSound8K) through the classification performance of several CNN variants.

The datasets ESC-50 and UrbanSound8K are a collection of short environmental recordings which split distinct classes such as animal sounds, human non-speech sounds, car horn, drilling etc. In pre-processing part of the experiment, four frequency-time representations were extracted in addition to MFFCs viz., linear-scaled STFT spectrogram, Mel-scaled STFT spectrogram, CQT spectrogram, CWT scalogram, MFCC cepstrogram.

The procedure for other transforms were similar to earlier procedure. The transform can be thought of as a series of logarithmically spaced filters fk, with the k-th filter having a spectral width δfk equal to a multiple of the previous filter’s width: where δfk is the bandwidth of the k-th filter, fmin is the central frequency of the lowest filter, and n is the number of filters per octave.

Like STFT, wideband and narrowband versions of the CQT were extracted and instead of decomposing it into sinusoids, the CWT was specified with 256 frequency bins and a Morlet mother function that has been used in previous audio recognition studies. Finally, MFCCs were computed and arranged as cepstrogram and the coefficients were normalized without taking the logarithmic function.

In order to keep the input feature map consistent, all the images were further downscaled with PIL using Lanczos resampling which helped in achieving higher processing speeds. The python libraries librosa and pywavelets were used for audio processing. On the Neural network end, two types of convolutional filters were considered viz., a 3×3 square filter and a Mx3 rectangular filter which implements 1-D convolution over time.

The convolutional layers were spread with rectified linear unit (ReLu) and max pooling layers. Overfitting hinders the performance of model, in order to overcome overfitting dropout  was used during training after the first convolutional and fully connected layers.

Training was performed using Adam optimization with a batch size of 100, and cross-entropy for the loss function. Models were trained for 200 epochs for ESC-50 and 100 epochs for UrbanSound8K. The order of samples in the training and test sets were randomly shuffled after each training epoch. The network was implemented in Python with Tensorflow.

Calculate the price
Make an order in advance and get the best price
Pages (550 words)
$0.00
*Price with a welcome 15% discount applied.
Pro tip: If you want to save more money and pay the lowest price, you need to set a more extended deadline.
We know how difficult it is to be a student these days. That's why our prices are one of the most affordable on the market, and there are no hidden fees.

Instead, we offer bonuses, discounts, and free services to make your experience outstanding.
How it works
Receive a 100% original paper that will pass Turnitin from a top essay writing service
step 1
Upload your instructions
Fill out the order form and provide paper details. You can even attach screenshots or add additional instructions later. If something is not clear or missing, the writer will contact you for clarification.
Pro service tips
How to get the most out of your experience with MyhomeworkGeeks
One writer throughout the entire course
If you like the writer, you can hire them again. Just copy & paste their ID on the order form ("Preferred Writer's ID" field). This way, your vocabulary will be uniform, and the writer will be aware of your needs.
The same paper from different writers
You can order essay or any other work from two different writers to choose the best one or give another version to a friend. This can be done through the add-on "Same paper from another writer."
Copy of sources used by the writer
Our college essay writers work with ScienceDirect and other databases. They can send you articles or materials used in PDF or through screenshots. Just tick the "Copy of sources" field on the order form.
Testimonials
See why 20k+ students have chosen us as their sole writing assistance provider
Check out the latest reviews and opinions submitted by real customers worldwide and make an informed decision.
business
Great job
Customer 452773, February 13th, 2023
English 101
IThank you
Customer 452631, April 6th, 2021
BUSINESS LAW
excellent job made a 93
Customer 452773, March 22nd, 2023
Leadership Studies
excellent job
Customer 452773, July 28th, 2023
Human Resources Management (HRM)
excellent work
Customer 452773, July 3rd, 2023
Nursing
thank you so much
Customer 452749, June 10th, 2021
FIN571
excellent
Customer 452773, March 15th, 2024
History
Don't really see any of sources I provided, but elsewise its great, thank you!
Customer 452697, May 8th, 2021
Business and administrative studies
Thank you for your hard work and effort. Made a 96 out of 125 points Lacked information from the rubic
Customer 452773, October 27th, 2023
English 101
great summery in terms of the time given. it lacks a bit of clarity but otherwise perfect.
Customer 452747, June 9th, 2021
Business and administrative studies
excellent paper
Customer 452773, March 3rd, 2023
Leadership Studies
awesome work as always
Customer 452773, August 19th, 2023
11,595
Customer reviews in total
96%
Current satisfaction rate
3 pages
Average paper length
37%
Customers referred by a friend
OUR GIFT TO YOU
15% OFF your first order
Use a coupon FIRST15 and enjoy expert help with any task at the most affordable price.
Claim my 15% OFF Order in Chat
Close

Sometimes it is hard to do all the work on your own

Let us help you get a good grade on your paper. Get professional help and free up your time for more important courses. Let us handle your;

  • Dissertations and Thesis
  • Essays
  • All Assignments

  • Research papers
  • Terms Papers
  • Online Classes
Live ChatWhatsApp