A DEEP HYBRID ARCHITECTURE FOR LOW-LIGHT PERSON RECOGNITION WITH VISION TRANSFORMERS AND ADAPTIVE ENHANCEMENT
Keywords:
person recognition, low-light vision, hybrid CNN-Transformer, contrast enhancement, deep learning, computer visionAbstract
low-light person recognition remains a major bottleneck in intelligent surveillance and autonomous systems. Conventional convolutional neural networks (CNNs) degrade sharply when illumination is poor due to low signal-to-noise ratios and reduced feature saliency. This paper presents a deep hybrid architecture combining adaptive image enhancement with a Vision Transformer (ViT)-based recognition backbone. A modular preprocessing block performs illumination estimation and Retinex-inspired contrast correction, followed by a dual-stream CNN–ViT fusion that learns both local textures and global contextual representations. Extensive experiments on SCface, DARK FACE, ExDark, and LLVIP datasets demonstrate consistent performance gains. The proposed model achieves 94.1 % Top-1 accuracy on SCface, mAP = 72.8 % on DARK FACE, and F1 = 0.903 on ExDark, outperforming current baselines by 17–23 %. Infrared-visible fusion on LLVIP further enhances robustness under illumination < 5 lux. These results confirm the feasibility of transformer-based architectures for real-time person recognition in challenging lighting conditions.
References
1. Chen, Z., Liu, R., Wang, S., & Zhang, Y. (2023). Low-light image enhancement with deep learning: A comprehensive review. IEEE Access, 11, 13457-13489. https://doi.org/10.1109/ACCESS.2023.3256712
2. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., et al. (2021). An image is worth 16×16 words: Transformers for image recognition at scale. International Conference on Learning Representations (ICLR).
3. Deng, J., Guo, J., Niannan, X., & Zafeiriou, S. (2019). ArcFace: Additive angular margin loss for deep face recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4690-4699.
4. Wei, C., Wang, W., Yang, W., & Liu, J. (2018). Deep Retinex decomposition for low-light enhancement. British Machine Vision Conference (BMVC).
5. Guo, C., Li, C., Guo, J., Cui, C., Zhou, S., Liu, B., et al. (2020). Zero-Reference deep curve estimation for low-light image enhancement. IEEE CVPR, 1780-1789.
6. Grgic, M., Delac, K., & Grgic, S. (2011). SCface—Surveillance cameras face database. Multimedia Tools and Applications, 51(3), 863-879.
7. Li, C., Luo, J., Zhou, W., & Xiong, Y. (2019). Learning to see in the dark with deep noise modeling. IEEE CVPR Workshops.
8. Jia, X., Zhang, L., & Tian, Q. (2021). LLVIP: A visible-infrared paired dataset for low-light vision. IEEE International Conference on Computer Vision Workshops (ICCVW).
9. Wu, Y., Chen, K., & Huang, H. (2023). Cross-modal attention for visible-infrared person recognition. Pattern Recognition, 138, 109449.
10. Gao, J., & Li, X. (2021). Person identification in challenging environments using CNNs. Neural Processing Letters, 53, 4181-4193.
11. Zhao, M., Sun, L., & Peng, Y. (2022). Domain adaptation for face recognition under surveillance scenarios. Neurocomputing, 491, 259-272.
12. Lv, F., Lu, F., & Zhou, J. (2021). Attention-guided low-light face enhancement for recognition. IEEE Transactions on Image Processing, 30, 5673-5685.
13. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., & Jégou, H. (2022). Training data-efficient image transformers & distillation through attention. ICML Proceedings, 2022.
14. Land, E. H., & McCann, J. J. (1971). Lightness and retinex theory. Journal of the Optical Society of America, 61(1), 1-11.
15. Chen, Y., Yu, J., & Zhang, W. (2022). Transformer-based low-light denoising with adaptive attention. IEEE TIP, 31, 5041-5054.
16. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. CVPR, 770-778.
17. Zhang, Y., Lin, J., & Wang, H. (2022). Vision transformers for low-illumination recognition tasks. Computer Vision and Pattern Recognition Workshops.
18. Ghosh, S., & Das, P. (2023). Hybrid attention networks for low-light object detection. Expert Systems with Applications, 226, 120104.
19. Ren, Y., Li, Y., Zhao, R., & Pan, J. (2022). Deep image fusion for night-time surveillance. IEEE Sensors Journal, 22(19), 18465-18475.
20. Xie, E., Wang, W., Yu, Z., An, W., & Dai, J. (2021). SegFormer: Simple and efficient design for semantic segmentation with transformers. NeurIPS Proceedings.
21. Huang, Z., & Yang, Y. (2024). Illumination-robust transformer for human re-identification. Pattern Recognition Letters, 180, 50-59.
22. Lin, T. Y., Ma, H., & Girshick, R. (2017). Focal loss for dense object detection. IEEE ICCV, 2980-2988.