Bird-Species Audio Identification, Ensembling 1D + 2D Signals

EasyChair Preprint 6062

11 pages•Date: July 13, 2021

Abstract

In this paper, a method for recognizing bird species in audio recordings is described. We have two dominant models: 1) A binary classifier for predicting if BirdCall is present in the audio or not; 2) A multiclass classifier for predicting which bird is present. Combining 1D and 2D signals gives strong results. We also experiment on ATDemucs which extends Demucs, replacing the BiLSTM with self-attention. In the waveform dimension, we first do source separation of multiple birds along with noise separation as Universal Source Separation. Then we classify each source, both using a 1D waveform model (ReSEMulti, but adding self-attention) and a 2D spectrogram model. We also discussed how we handle different thresholds for different models by a post-processing technique. Ensembling techniques like Voting and Scaling described in gave us a good boost in our results. Our combined architecture, including 1D and 2D signals, achieves 0.619 micro-averaged F1 in the task that asked for classification of 347 bird species.

Keyphrases: Attention Mechanism, Audio Source Detection, Bird Species Classification, Demucs, Efficient Net, Ensembling, Multi Domain Meta Training, Transfer Learning, deep learning, sound detection, spectrogram model

Links:

https://easychair.org/publications/preprint/dXRH

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:6062,
  author    = {Gyanendra Das and Saksham Aggarwal},
  title     = {Bird-Species Audio Identification, Ensembling 1D + 2D Signals},
  howpublished = {EasyChair Preprint 6062},
  year      = {EasyChair, 2021}}

Download PDF Open PDF in browser