Download PDFOpen PDF in browserDilated Convolution to Capture Scale Invariant Context in Crowd Density Estimation15 pages•Published: July 18, 2022AbstractCrowd Density Estimation (CDE) can be used ensure safety of crowds by preventing stampedes or reducing spread of disease which was made urgent with the rise of Covid-19. CDE a challenging problem due to problems such as occlusion and massive scale varia- tions. This research looks to create, evaluate and compare different approaches to crowd counting focusing on the ability for dilated convolution to extract scale-invariant contex- tual information. In this work we build and train three different model architectures: a Convolutional Neural Network (CNN) without dilation, a CNN with dilation to capture context and a CNN with an Atrous Spatial Pyramid Pooling (ASPP) layer to capture scale-invariant contextual features. We train each architecture multiple times to ensure statistical significance and evaluate them using the Mean Squared Error (MSE), Mean Average Error (MAE) and Grid Average Mean Absolute Error (GAME) on the Shang- haiTech and UCF CC 50 datasets. Comparing the results between approaches we find that applying dilated convolution to more sparse crowd images with little scale variations does not make a significant difference but, on highly congested crowd images, dilated con- volutions are more resilient to occlusion and perform better. Furthermore, we find that adding an ASPP layer improves performance in the case when there are significant differ- ences in the scale of objects within the crowds. The code for this research is available at https://github.com/ThishenP/crowd-density.Keyphrases: atrous spatial pyramid pooling, convolutional neural networks, crowd density estimation, dilated convolution In: Aurona Gerber (editor). Proceedings of 43rd Conference of the South African Institute of Computer Scientists and Information Technologists, vol 85, pages 89-103.
|