Proposal of an open-source accelerators library for inference of transformer networks in edge devices based on Linux
Main Article Content
Abstract
Transformers networks have been a great milestone in the natural language processing field, and have powered technologies like ChatGPT, which are undeniably changing people’s lives. This article discusses the characteristics and computational complexity of Transformers networks, as well as, the potential for improving its performance in low-resource environments through the use of hardware accelerators. This research has the potential to significantly improve the performance of Transformers in edge and low-end devices. In addition, Edge Artificial Intelligence, Hardware Acceleration, and Tiny Machine Learning algorithms are explored. The proposed methodology includes a software and hardware layer, with a Linux-based minimal image built on top of a synthesized RTL. The proposal also includes a library of hardware accelerators that can be customized to select the desired accelerators based on the device’s resources and operations to be accelerated.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Los autores conservan los derechos de autor y ceden a la revista el derecho de la primera publicación y pueda editarlo, reproducirlo, distribuirlo, exhibirlo y comunicarlo en el país y en el extranjero mediante medios impresos y electrónicos. Asimismo, asumen el compromiso sobre cualquier litigio o reclamación relacionada con derechos de propiedad intelectual, exonerando de responsabilidad a la Editorial Tecnológica de Costa Rica. Además, se establece que los autores pueden realizar otros acuerdos contractuales independientes y adicionales para la distribución no exclusiva de la versión del artículo publicado en esta revista (p. ej., incluirlo en un repositorio institucional o publicarlo en un libro) siempre que indiquen claramente que el trabajo se publicó por primera vez en esta revista.
References
A. Vaswani et al. “Attention is All You Need”. 31st International Conference on Neural Information Processing
Systems, Long Beach, California, 2017, pp. 6000–6010.
OpenAI, “GPT-4 Technical Report,” [Online], Mar 15 2023. Available: https://doi.org/10.48550/arXiv.2303.08774
U. Farooq. (2021). What Is Hardware Acceleration and When Should You Use It? [Online]. Available: https://
www.makeuseof.com/what-is-hardware-acceleration/
A. N. Mazumder et al. (2021, Dec). “A Survey on the Optimization of Neural Network Accelerators for Micro-AI
On-Device Inference”. IEEE Journal on Emerging and Selected Topics in Circuits and Systems [Online]. Vol.
, issue 4, pp. 532-547. Available: https://doi.org/10.1109/JETCAS.2021.3129415
J. Lee and H. J. Yoo. (2021, Oct). “An Overview of Energy-Efficient Hardware Accelerators for On-Device
Deep-Neural-Network Training”. IEEE Open Journal of the Solid-State Circuits Society [Online]. Vol. 1, pp. 115-
Available: https://doi.org/10.1109/OJSSCS.2021.3119554
H. Yang and X. Lingao. (2023). “Structured Pruning for Deep Convolutional Neural Networks: A survey”
[Online]. Available: http://doi.org/10.48550/arXiv.2303.00566
Y. Hancheng, B. Zhang, T. Chen, and J. Fan. (2023, March). “Performance-aware Approximation of Global
Channel Pruning for Multitask CNNs” [Online]. Available: https://doi.org/10.1109/TPAMI.2023.3260903
H. Park and S. Kim, “Chapter Three – Hardware accelerator systems for artificial intelligence and machine
learning,” in Hardware Accelerator Systems for Artificial Intelligence and Machine Learning, S. Kim and G. C.
Deka, Eds. Amsterdam: Elsevier, 2021, pp 51-95.
S. Disabato and M. Roveri. (2022, Dec). “Tiny Machine Learning for Concept Drift”. IEEE Transactions on
Neural Networks and Learning Systems [Online]. Available: https://doi.org/10.1109/TNNLS.2022.3229897
S. Lu et al, “Hardware accelerator for multi-head attention and position-wise feed-forward in the transformer,”
in 2020 IEEE 33rd International System-on-Chip Conference (SOCC), 2020, .
C. Fang et al, “An efficient hardware accelerator for sparse transformer neural networks,” in 2022 IEEE
International Symposium on Circuits and Systems (ISCAS), 2022, .
H. Wang and T. Chang, “Row-wise accelerator for vision transformer,” in 2022 IEEE 4th International
Conference on Artificial Intelligence Circuits and Systems (AICAS), 2022, .
P. Qi et al, “Accelerating framework of transformer by hardware design and model compression co-optimization,” in 2021 IEEE/ACM International Conference on Computer Aided Design (ICCAD), 2021, .
L. Bai, Y. Zhao and X. Huang, “A CNN accelerator on FPGA using depthwise separable convolution,” IEEE
Transactions on Circuits and Systems II: Express Briefs, vol. 65, (10), pp. 1415-1419, 2018.
A. Kyriakos et al, “High performance accelerator for cnn applications,” in 2019 29th International Symposium
on Power and Timing Modeling, Optimization and Simulation (PATMOS), 2019, .
Z. Azad, R. Sen, K. Park, and A. Joshi, “Hardware Acceleration for DBMS Machine Learning Scoring: Is It Worth
the Overheads?,” presented at the - 2021 IEEE International Symposium on Performance Analysis of Systems
and Software (ISPASS), 2021, pp. 243–253, doi: 10.1109/ISPASS51385.2021.00047.