Comparison Of Time-Frequency Representations for Environmental Sound Classification Using Convolutional Neural Networks

My essay is on the paper presented by the author Muhammad Huzaifah on the topic Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks. This paper has introduced the importance of environmental sound classification which can be performed using the convolutional neural networks (CNNs).

Given the fact that CNNs are more suited for image and video applications, recent advancement in transforming 1-D audio to 2-D spectrogram image has aided the implementation of CNNs for audio processing purposes. The author has used various signal processing methods such as short-time Fourier Transform (STFT) with linear and Mel scales for audio, Constant-Q transform (CQT) and continuous wavelet transform (CWT) in order to observe their impact on the sound classification performance of environmental sound datasets.

The paper supports the hypothesis that vital features for sound classification is highly dependent on time-frequency representations. Moreover, the sliding window length of STFT depends on the characteristics of audio signal and 2-D convolution performs better when compared to 1-D convolution.

Since the author has used CNNs for classification, the conventional choices such as Mel-frequency cepstral coefficients (MFCCs) or Perceptual Linear Prediction (PLP) coefficients that were previously defined as basic building blocks for Gaussian mixture model (GMM)-based Hidden Markov Models (HMMs) are redundant. The reason being feature maps used in deep learning algorithms are independent of MFCCs and PLP features in order to be un-correlated.

This paper is built on the previous studies where the performance of short-time Fourier transform (STFT), fast wavelet transform (FWT) and continuous wavelet transform (CWT) were compared to the conventional machine learning techniques mentioned before and dives deep into the specifics of a CNN model.

The author has used STFT on both linear and Mel scales, CQT and CWT to assess the impact of different approaches in comparison to the baseline MFCC features on two publicly available environmental sound datasets (ESC- 50, UrbanSound8K) through the classification performance of several CNN variants.

The datasets ESC-50 and UrbanSound8K are a collection of short environmental recordings which split distinct classes such as animal sounds, human non-speech sounds, car horn, drilling etc. In pre-processing part of the experiment, four frequency-time representations were extracted in addition to MFFCs viz., linear-scaled STFT spectrogram, Mel-scaled STFT spectrogram, CQT spectrogram, CWT scalogram, MFCC cepstrogram.

The procedure for other transforms were similar to earlier procedure. The transform can be thought of as a series of logarithmically spaced filters fk, with the k-th filter having a spectral width δfk equal to a multiple of the previous filter’s width: where δfk is the bandwidth of the k-th filter, fmin is the central frequency of the lowest filter, and n is the number of filters per octave.

Like STFT, wideband and narrowband versions of the CQT were extracted and instead of decomposing it into sinusoids, the CWT was specified with 256 frequency bins and a Morlet mother function that has been used in previous audio recognition studies. Finally, MFCCs were computed and arranged as cepstrogram and the coefficients were normalized without taking the logarithmic function.

In order to keep the input feature map consistent, all the images were further downscaled with PIL using Lanczos resampling which helped in achieving higher processing speeds. The python libraries librosa and pywavelets were used for audio processing. On the Neural network end, two types of convolutional filters were considered viz., a 3×3 square filter and a Mx3 rectangular filter which implements 1-D convolution over time.

The convolutional layers were spread with rectified linear unit (ReLu) and max pooling layers. Overfitting hinders the performance of model, in order to overcome overfitting dropout  was used during training after the first convolutional and fully connected layers.

Training was performed using Adam optimization with a batch size of 100, and cross-entropy for the loss function. Models were trained for 200 epochs for ESC-50 and 100 epochs for UrbanSound8K. The order of samples in the training and test sets were randomly shuffled after each training epoch. The network was implemented in Python with Tensorflow.

Calculate the price
Make an order in advance and get the best price
Pages (550 words)
*Price with a welcome 15% discount applied.
Pro tip: If you want to save more money and pay the lowest price, you need to set a more extended deadline.
We know how difficult it is to be a student these days. That's why our prices are one of the most affordable on the market, and there are no hidden fees.

Instead, we offer bonuses, discounts, and free services to make your experience outstanding.
How it works
Receive a 100% original paper that will pass Turnitin from a top essay writing service
step 1
Upload your instructions
Fill out the order form and provide paper details. You can even attach screenshots or add additional instructions later. If something is not clear or missing, the writer will contact you for clarification.
Pro service tips
How to get the most out of your experience with MyhomeworkGeeks
One writer throughout the entire course
If you like the writer, you can hire them again. Just copy & paste their ID on the order form ("Preferred Writer's ID" field). This way, your vocabulary will be uniform, and the writer will be aware of your needs.
The same paper from different writers
You can order essay or any other work from two different writers to choose the best one or give another version to a friend. This can be done through the add-on "Same paper from another writer."
Copy of sources used by the writer
Our college essay writers work with ScienceDirect and other databases. They can send you articles or materials used in PDF or through screenshots. Just tick the "Copy of sources" field on the order form.
See why 20k+ students have chosen us as their sole writing assistance provider
Check out the latest reviews and opinions submitted by real customers worldwide and make an informed decision.
Impressive writing
Customer 452547, February 6th, 2021
Social Work and Human Services
Great work I would love to continue working with this writer thought out the 11 week course.
Customer 452667, May 30th, 2021
Thank youuuu
Customer 452729, May 30th, 2021
Leadership Studies
excellent job as always
Customer 452773, September 2nd, 2023
excellent job
Customer 452773, August 12th, 2023
Customer 452591, March 18th, 2021
Looks great and appreciate the help.
Customer 452675, April 26th, 2021
Business and administrative studies
Thank you
Customer 452773, March 19th, 2023
Business and administrative studies
excellent job!
Customer 452773, May 25th, 2023
Thank you!!! I received my order in record timing.
Customer 452551, February 9th, 2021
Don't really see any of sources I provided, but elsewise its great, thank you!
Customer 452697, May 8th, 2021
Business and administrative studies
Thank you for your hard work
Customer 452773, October 19th, 2023
Customer reviews in total
Current satisfaction rate
3 pages
Average paper length
Customers referred by a friend
15% OFF your first order
Use a coupon FIRST15 and enjoy expert help with any task at the most affordable price.
Claim my 15% OFF Order in Chat

Sometimes it is hard to do all the work on your own

Let us help you get a good grade on your paper. Get professional help and free up your time for more important courses. Let us handle your;

  • Dissertations and Thesis
  • Essays
  • All Assignments

  • Research papers
  • Terms Papers
  • Online Classes
Live ChatWhatsApp