| Citation: | Ye Tian, Shuiying Xiang, Xingxing Guo, Yahui Zhang, Jiashang Xu, Shangxuan Shi, Haowen Zhao, Yizhi Wang, Xinran Niu, Wenzhuo Liu, Yue Hao. Photonic transformer chip: interference is all you need[J]. PhotoniX. doi: 10.1186/s43074-025-00182-7 |
| [1] |
Vaswani, A., et al. Attention is all you need in Proceedings of the 31st International Conference on Neural Information Processing Systems. Curran Associates Inc.: Long Beach, California, USA. 2017;6000–6010.
|
| [2] |
Dosovitskiy, A., et al. An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv. 2020;abs/2010.11929.
|
| [3] |
Mehta, S. and M. Rastegari, Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. ArXiv, 2021. abs/2110.02178.
|
| [4] |
Carion N, et al. End-to-end object detection with transformers. Cham: Springer International Publishing; 2020.
|
| [5] |
Devlin J, et al. Bert: Pre-training of deep bidirectional transformers for language understanding in North American Chapter of the Association for Computational Linguistics. 2019.
|
| [6] |
Achiam OJ, et al. Gpt-4 technical report. 2023.
|
| [7] |
Xie T, et al. Vit-mvt: A unified vision transformer network for multiple vision tasks. IEEE Transactions on Neural Networks and Learning Systems, 2023. p. 1–15.
|
| [8] |
Howard, J. and S. Ruder. Fine-tuned language models for text classification. ArXiv. 2018;abs/1801.06146.
|
| [9] |
Tripp, C.E., et al. Measuring the energy consumption and efficiency of deep neural networks: An empirical analysis and design recommendations. ArXiv. 2024;abs/2403.08151.
|
| [10] |
Berggren K, et al. Roadmap on emerging hardware and technology for machine learning. Nanotechnology. 2021;32(1):012002.
|
| [11] |
Totović AR, et al. Femtojoule per mac neuromorphic photonics: an energy and technology roadmap. IEEE J Sel Top Quantum Electron. 2020;26(5):1–15.
|
| [12] |
Shen Y, et al. Deep learning with coherent nanophotonic circuits. Nat Photon. 2017;11(7):441–6.
|
| [13] |
Tian Y, et al. Scalable and compact photonic neural chip with low learning-capability-loss. Nanophotonics. 2022;11(2):329–44.
|
| [14] |
Bandyopadhyay S, et al. Single-chip photonic deep neural network with forward-only training. Nat Photon. 2024;18(12):1335–43.
|
| [15] |
Feldmann J, et al. Parallel convolutional processing using an integrated photonic tensor core. Nature. 2021;589(7840):52–8.
|
| [16] |
Shekhar S, et al. Roadmapping the next generation of silicon photonics. Nat Commun. 2024;15(1):751.
|
| [17] |
Ramey, C. Silicon photonics for artificial intelligence acceleration : Hotchips 32 in 2020 IEEE Hot Chips 32 Symposium (HCS). 2020.
|
| [18] |
Liu J, et al. Research progress in optical neural networks: theory, applications and developments. PhotoniX. 2021;2(1):5.
|
| [19] |
Inagaki T, et al. Collective and synchronous dynamics of photonic spiking neurons. Nat Commun. 2021;12(1):2325.
|
| [20] |
Feldmann J, et al. All-optical spiking neurosynaptic networks with self-learning capabilities. Nature. 2019;569(7755):208–14.
|
| [21] |
Xiang S, et al. Semiconductor lasers for photonic neuromorphic computing and photonic spiking neural networks: a perspective. APL Photonics. 2024. https://doi.org/10.1063/5.0217968.
|
| [22] |
Xiang S, et al. Hardware-algorithm collaborative computing with photonic spiking neuron chip based on an integrated fabry–perot laser with a saturable absorber. Optica. 2023;10(2):162–71.
|
| [23] |
Xiang S, et al. Photonic integrated neuro-synaptic core for convolutional spiking neural network. Opto-Electron Adv. 2023;6(11):230140.
|
| [24] |
Marinis LD, et al. Photonic neural networks: a survey. IEEE Access. 2019;7:175827–41.
|
| [25] |
Wright LG, et al. Deep physical neural networks trained with backpropagation. Nature. 2022;601(7894):549–55.
|
| [26] |
Zhu H, et al. Dota: a dynamically-operated photonic tensor core for energy-efficient transformer accelerator. Optica Open. 2023.
|
| [27] |
Gu, J., et al. Towards area-efficient optical neural networks: An fft-based architecture in 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC). 2020.
|
| [28] |
Feng C, et al. Integrated multi-operand optical neurons for scalable and hardware-efficient deep learning. Nanophotonics. 2024;13(12):2193–206.
|
| [29] |
Zhou H, et al. Photonic matrix multiplication lights up photonic accelerator and beyond. Light Sci Appl. 2022;11(1): 30.
|
| [30] |
Afifi S, et al. Tron: Transformer neural network acceleration with non-coherent silicon photonics inProceedings of the Great Lakes Symposium on VLSI 2023. Association for Computing Machinery: Knoxville, TN, USA. 2023. p. 15–21.
|
| [31] |
Anderson MG, et al. Optical transformers. Trans. Mach. Learn. Res. 2023.2024.
|
| [32] |
Sha H, et al. Vision transformer with photonic integrated circuits. Spie/cos photonics asia. SPIE. 2024:(13236).
|
| [33] |
Dong B, et al. Partial coherence enhances parallelized photonic computing. Nature. 2024;632(8023):55–62.
|
| [34] |
Tait AN, et al. Microring weight banks. IEEE J Sel Top Quantum Electron. 2016;22(6):312–25.
|
| [35] |
Demirkıran C, et al. Mirage: An rns-based photonic accelerator for dnn training. 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA). 2023.p.73–87.
|
| [36] |
Grundmann M. Kramers–kronig relations in The physics of semiconductors: An introduction including nanophysics and applications. Springer Berlin Heidelberg: Berlin, Heidelberg. 2010.p. 775–776.
|
| [37] |
Xu X, et al. Self-calibrating programmable photonic integrated circuits. Nat Photon. 2022;16(8):595–602.
|
| [38] |
Mecozzi A, Antonelli C, Shtaif M. Kramers-kronig receivers. Adv Opt Photonics. 2019;11(3):480–517.
|
| [39] |
Zhu Z, et al. Coherent general-purpose photonic matrix processor. ACS Photonics. 2024;11(3):1189–96.
|
| [40] |
Zhang M, et al. Tempo: efficient time-multiplexed dynamic photonic tensor core for edge AI with compact slow-light electro-optic modulator. J Appl Phys. 2024. https://doi.org/10.1063/5.0203036.
|
| [41] |
Li C, et al. The challenges of modern computing and new opportunities for optics. PhotoniX. 2021;2(1):20.
|
| [42] |
Mourgias-Alexandris G, et al. Noise-resilient and high-speed deep learning with coherent silicon photonics. Nat Commun. 2022;13(1):5572.
|
| [43] |
Youngblood N. Coherent photonic crossbar arrays for large-scale matrix-matrix multiplication. IEEE J Sel Top Quantum Electron. 2023;29(2: Optical Computing):1–11.
|
| [44] |
Rahimi Kari S, et al. Realization of an integrated coherent photonic platform for scalable matrix operations. Optica. 2024;11(4):542–51.
|
| [45] |
McMahon PL. The physics of optical computing. Nat Rev Phys. 2023;5(12):717–34.
|
| [46] |
Meng X, et al. Compact optical convolution processing unit based on multimode interference. Nat Commun. 2023;14(1):3000.
|
| [47] |
Ashtiani F, Geers AJ, Aflatouni F. An on-chip photonic deep neural network for image classification. Nature. 2022;606(7914):501–6.
|
| [48] |
Xu X, et al. 11 tops photonic convolutional accelerator for optical neural networks. Nature. 2021;589(7840):44–51.
|
| [49] |
Shastri BJ, et al. Photonics for artificial intelligence and neuromorphic computing. Nat Photon. 2021;15(2):102–14.
|
| [50] |
Zhang H, et al. An optical neural chip for implementing complex-valued neural network. Nat Commun. 2021;12(1):457.
|
| [51] |
Sze V, et al. Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE. 2017;105(12):2295–329.
|
| [52] |
Tian Y, et al. Photonic neural networks with kramers–kronig activation. Adv Photon Res. 2023;4(9): 2300062.
|
| [53] |
Tang S, et al. Reconfigurable integrated photonic unitary neural networks with phase encoding enabled by in-situ training. IEEE Photon J. 2024;16(5):1–11.
|
| [54] |
Choromanski K, et al. Rethinking attention with performers. ArXiv, 2020. abs/2009.14794.
|
| [55] |
Rahimi A, Recht B. Random features for large-scale kernel machines, In Proceedings of the 21st International Conference on Neural Information Processing Systems. Curran Associates Inc.: Vancouver, British Columbia, Canada; 2007. p. 1177–1184.
|
| [56] |
Fang MYS, et al. Design of optical neural networks with component imprecisions. Opt Express. 2019;27(10):14009–29.
|
| [57] |
Cem A, et al. Thermal crosstalk modeling and compensation for programmable photonic processors. In 2023 IEEE Photonics Conference (IPC). 2023.
|
| [58] |
Milanizadeh M, et al. Control and calibration recipes for photonic integrated circuits. IEEE J Sel Top Quantum Electron. 2020;26(5):1–10.
|
| [59] |
Bai B, et al. Microcomb-based integrated photonic processing unit. Nat Commun. 2023;14(1):66.
|
| [60] |
Lightelligence. Tianshu optical-electrical hybrid computing card. 2025. Available from: https://www.xztech.ai/index.php/product/index/6.html.
|
| [61] |
Xue Z, et al. Fully forward mode training for optical neural networks. Nature. 2024;632(8024):280–6.
|
| [62] |
Zhu H, et al. Lightening-transformer: A dynamically-operated optically-interconnected photonic transformer accelerator. In 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 2024.
|
| [63] |
Su J, et al. Roformer: Enhanced transformer with rotary position embedding. ArXiv. 2021.abs/2104.09864.
|
| [64] |
Wei X, et al. Videorope: What makes for good video rotary position embedding? ArXiv. 2025.abs/2502.05173.
|
| [65] |
Bai J, et al. Qwen technical report. ArXiv. 2023.abs/2309.16609.
|
| [66] |
Zeng TGA, et al. Chatglm: A family of large language models from glm-130b to glm-4 all tools. ArXiv. 2024.abs/2406.12793.
|
| [67] |
Touvron H, et al. Llama: Open and efficient foundation language models. ArXiv. 2023.abs/2302.13971.
|
| [68] |
Jouppi NP, et al. Tpu v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings. Proceedings of the 50th Annual International Symposium on Computer Architecture. 2023.
|
| [69] |
Wang YE, Wei G.-Y, Brooks DM. Benchmarking tpu, gpu, and cpu platforms for deep learning. ArXiv. , 2019.abs/1907.10701.
|
| [70] |
Hertel IV, Schulz CP. Coherence and photons in Atoms, molecules and optical physics 2: Molecules and photons - spectroscopy and collisions. Springer Berlin Heidelberg: Berlin, Heidelberg. 2015. p. 71–134.
|
| [71] |
Ahmed SR, et al. Universal photonic artificial intelligence acceleration. Nature. 2025;640(8058):368–74.
|
| [72] |
Hua S, et al. An integrated large-scale photonic accelerator with ultralow latency. Nature. 2025;640(8058):361–7.
|
| [73] |
Fann CH, et al. Novel parallel digital optical computing system (doc) for generative a.I. In 2024 IEEE International Electron Devices Meeting (IEDM). 2024.
|
| [74] |
Han C, et al. Slow-light silicon modulator with 110-GHz bandwidth. Sci Adv. 2023;9(42): eadi5339.
|
| [75] |
Alloatti L, et al. 100 GHz silicon–organic hybrid modulator. Light Sci Appl. 2014;3(5):e173–e173.
|
| [76] |
Filipovich MJ, et al. Silicon photonic architecture for training deep neural networks with direct feedback alignment. Optica. 2022;9(12):1323–32.
|
| [77] |
Dao T, et al. Flashattention: Fast and memory-efficient exact attention with io-awareness. ArXiv. 2022.abs/2205.14135.
|
| [78] |
Kwon W, et al. Efficient memory management for large language model serving with pagedattention. Proceedings of the 29th Symposium on Operating Systems Principles. 2023.
|
| [79] |
Pope R, et al. Efficiently scaling transformer inference. ArXiv. 2022.abs/2211.05102.
|
| [80] |
Ivanov A, et al. Data movement is all you need: A case study on optimizing transformers. ArXiv. 2020.abs/2007.00072.
|
| [81] |
Shao Z, et al. Deepseek-v2: a strong, economical, and efficient mixture-of-experts language model. ArXiv. 2024. abs/2405.04434.
|
| [82] |
Kumar SK. On weight initialization in deep neural networks. ArXiv. 2017.abs/1704.08863.
|
| [83] |
He K, et al. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. IEEE International Conference on Computer Vision (ICCV). 2015;2015:1026–34.
|
| [84] |
Narkhede MV, Bartakke PP, Sutaone MS. A review on weight initialization strategies for neural networks. Artif Intell Rev. 2022;55(1):291–322.
|