High Dimensional Spatial Information and Multi-scale Fusion Network for Efficient and Real-Time Small Object Detection in Remote Sensing Images

Li, Haochen; Tao, Hongfeng; Qiu, Jier; Stojanović, Vladimir

Please use this identifier to cite or link to this item: https://scidar.kg.ac.rs/handle/123456789/23113

Full metadata record

DC Field	Value	Language
dc.contributor.author	Li, Haochen	-
dc.contributor.author	Tao, Hongfeng	-
dc.contributor.author	Qiu, Jier	-
dc.contributor.author	Stojanović, Vladimir	-
dc.date.accessioned	2026-04-08T12:38:45Z	-
dc.date.available	2026-04-08T12:38:45Z	-
dc.date.issued	2026	-
dc.identifier.issn	2631-8695	en_US
dc.identifier.uri	https://scidar.kg.ac.rs/handle/123456789/23113	-
dc.description.abstract	The detection of small objects in remote sensing imagery remains a formidable challenge due to their minimal pixel occupancy, blurred structural boundaries, and susceptibility to environmental interference. To solve these problems, this paper proposes a novel network architecture named multidimensional information feature fusion-you only look once (MIFF-YOLO), which integrates several specialized modules. To address the challenge of small objects being obscured by complex environmental factors, we propose a multidimensional information fusion (MIF) module for the neck network, which leverages a 3D convolution and a full-domain transformer (FDT) to create cross scale dependencies and integrate global contextual information with local details. For the purpose of preserving the spatial and edge information of small objects, an efficient front end module (EFEM) is embedded into the C3k2 architecture. The EFEM module employs a parallel, learnable dual-path architecture that collaboratively integrates a Sobel convolution stream for explicit edge detection and a spatial information stream max-pooling for detail preservation, enabling simultaneous extraction of structural boundaries and contextual textures. These complementary features undergo an adaptive fusion via omni-dimensional dynamic convolution (ODConv), thereby enriching the capabilities of the feature representation. In order to address the loss of critical details in small object features during enlargement, dynamic upconvolution block (DUB) is introduced to replace standard upsampling module. Adaptive feature sampling is achieved through content-aware dynamic offsets, mitigating detail loss during resolution recovery. Compared with the original baseline algorithm, the improved network achieved a 3.7% improvement on mAP@50 and a 3.9% improvement on mAP@50:95, with the FPS reaching 120 on the DOTA dataset. This shows that the improved algorithm effectively enhances small object detection performance in remote sensing images while maintaining excellent real-time detection efficiency.	en_US
dc.language.iso	en	en_US
dc.relation	451-03-34/2026-03/200108	en_US
dc.relation.ispartof	Engineering Research Express	en_US
dc.subject	Remote sensing object detection	en_US
dc.subject	Multidimensional information	en_US
dc.subject	Small object detection	en_US
dc.subject	Spatial details	en_US
dc.subject	YOLO11	en_US
dc.title	High Dimensional Spatial Information and Multi-scale Fusion Network for Efficient and Real-Time Small Object Detection in Remote Sensing Images	en_US
dc.type	article	en_US
dc.description.version	Author's version	en_US
dc.identifier.doi	10.1088/2631-8695/ae590f	en_US
dc.type.version	PublishedVersion	en_US
Appears in Collections:	Faculty of Mechanical and Civil Engineering, Kraljevo

Page views(s)

37

Downloads(s)

3

Files in This Item:

File	Description	Size	Format
ERX_2026_1.pdf Restricted Access		225.33 kB	Adobe PDF	View/Open

Show simple item record

SCIDAR - A Digital Archive of the University of Kragujevac

Page views(s)

Downloads(s)