Fire detection via effective vision transformers

Access the paper

Journal of the Korea Next Generation Computing Society, 2021 published [🌐Online]

Additional Link

Korean Journal Link

Abstract

In today’s modern age, smart and safe cities are one of the major concerns of the research community. The cities are surrounded by open areas, agricultural land, and forests, where fire incidence can threaten human lives and damage properties. Recently, vision sensor-based fire detection has attracted computer vision domain experts, where leading performance has been achieved by various convolutional neural networks (CNNs). However, these techniques are translation invariant, locality-sensitive, and lack a global understanding of images. Furthermore, CNN-based models use pooling layers for dimensionality reduction, which reduces computational cost but also loses meaningful information such as the precise location of the most active feature detector. In this work, we develop a Vision Transformer (ViT)-based model for fire detection and feed image patches into the transformer in a sequential structure similar to word embeddings. Experimental results demonstrate competitive performance compared to state-of-the-art CNN-based methods.

Fire detection via effective vision transformers

Access the paper

Additional Link

Abstract

Additional Comments