Articles | Open Access | https://doi.org/10.55640/eijmrms-05-07-04

Temporal Modeling and Real-Time Recognition Approaches in SLR Systems

Kayumov Oybek Achilovich , Jizzakh Branch of National University of Uzbekistan Named After Mirzo Ulugbek, Uzbekistan

Abstract

This article is dedicated to analyzing advanced approaches in temporal modeling and real-time gesture recognition within sign language recognition (SLR) systems. Sign glosses are expressed through the spatio-temporal characteristics of visual information, which requires the use of sequence-processing models for their automatic recognition. The study primarily evaluates the effectiveness of three key models: Long Short-Term Memory (LSTM) networks, Temporal Convolutional Networks (TCN), and Transformer-based architectures.

The article also examines methods applied for real-time analysis of sign glosses, including:

Sliding window segmentation of video streams;

Self-attention mechanisms for identifying dependencies between gestures;

Gloss mapping algorithms for linking sign movements to linguistic units;

Ontological integration techniques for enhancing semantic accuracy.

Practical results indicate that combining temporal modeling with semantic analysis and contextual verification algorithms ensures continuous and high-accuracy recognition of sign movements. In particular, multimodal systems (video + sensor + gloss) utilizing Transformer-based approaches achieved superior performance in real-time conversion of continuous sign gloss streams into text.

The findings of this study hold practical significance for the development of smart assistive devices for automatic sign language translation, interactive interfaces for hearing-impaired users, and specialized SLR platforms for educational and instructional purposes.

Keywords

Sign Language Recognition, temporal modeling, real-time SLR systems

References

Camgoz, N. C., Hadfield, S., Koller, O., Ney, H., & Bowden, R. (2018). Neural Sign Language Translation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 7784–7793.

Cui, R., Liu, H., & Zhang, C. (2019). A deep learning approach to continuous sign language recognition by iterative training. International Journal of Computer Vision, 127(11–12), 1690–1705.

Hu, H., Zhou, W., Li, H., & Li, W. (2023). SignBERT+: Hand-model-aware self-supervised pretraining for sign language understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5), 5678–5692.

Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems (NeurIPS), 30, 5998–6008.

Liu, J., Liang, H., Li, L., & Jiang, X. (2020). FrameNet-based semantic analysis for continuous sign language recognition. Pattern Recognition Letters, 131, 296–302.

Saunders, B., Camgoz, N. C., & Bowden, R. (2020). Progressive Transformers for End-to-End Sign Language Production. Proceedings of the European Conference on Computer Vision (ECCV), 687–705.

Zuo, Z., Fang, Y., & Wang, S. (2023). MS2SL: Multisource-to-Sign-Language model for synchronized multimodal sign recognition. Computer Vision and Image Understanding, 228, 103610.

Baltrušaitis, T., Ahuja, C., & Morency, L. P. (2019). Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2), 423–443.

Koller, O., Zargaran, S., Ney, H., & Bowden, R. (2020). Quantifying Translation Quality of Sign Language Recognition Systems on PHOENIX14T. European Conference on Computer Vision (ECCV), 477–494.

Google Research. (2021). MediaPipe Holistic: Simultaneous face, hand, and body pose detection. Retrieved from https://google.github.io/mediapipe

Article Statistics

Copyright License

Download Citations

How to Cite

Kayumov Oybek Achilovich. (2025). Temporal Modeling and Real-Time Recognition Approaches in SLR Systems. European International Journal of Multidisciplinary Research and Management Studies, 5(07), 28–31. https://doi.org/10.55640/eijmrms-05-07-04