Download PDFOpen PDF in browser

Bird-Species Audio Identification, Ensembling 1D + 2D Signals

EasyChair Preprint no. 6062

11 pagesDate: July 13, 2021


In this paper, a method for recognizing bird species in audio recordings is described. We have two dominant models: 1) A binary classifier for predicting if BirdCall is present in the audio or not; 2) A multiclass classifier for predicting which bird is present. Combining 1D and 2D signals gives strong results. We also experiment on ATDemucs which extends Demucs, replacing the BiLSTM with self-attention. In the waveform dimension, we first do source separation of multiple birds along with noise separation as Universal Source Separation. Then we classify each source, both using a 1D waveform model (ReSEMulti, but adding self-attention) and a 2D spectrogram model. We also discussed how we handle different thresholds for different models by a post-processing technique. Ensembling techniques like Voting and Scaling described in gave us a good boost in our results. Our combined architecture, including 1D and 2D signals, achieves 0.619 micro-averaged F1 in the task that asked for classification of 347 bird species.

Keyphrases: Attention Mechanism, Audio Source Detection, Bird Species Classification, deep learning, Demucs, Efficient Net, Ensembling, Multi Domain Meta Training, sound detection, spectrogram model, Transfer Learning

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
  author = {Gyanendra Das and Saksham Aggarwal},
  title = {Bird-Species Audio Identification, Ensembling 1D + 2D Signals},
  howpublished = {EasyChair Preprint no. 6062},

  year = {EasyChair, 2021}}
Download PDFOpen PDF in browser