Photonic transformer chip: interference is all you need

Ye Tian; Shuiying Xiang; Xingxing Guo; Yahui Zhang; Jiashang Xu; Shangxuan Shi; Haowen Zhao; Yizhi Wang; Xinran Niu; Wenzhuo Liu; Yue Hao

doi:10.1186/s43074-025-00182-7

Article Contents

Article Navigation > PhotoniX > > Accepted Manuscript

Ye Tian, Shuiying Xiang, Xingxing Guo, Yahui Zhang, Jiashang Xu, Shangxuan Shi, Haowen Zhao, Yizhi Wang, Xinran Niu, Wenzhuo Liu, Yue Hao. Photonic transformer chip: interference is all you need[J]. PhotoniX. doi: 10.1186/s43074-025-00182-7

Citation:

PDF( 0 KB)

Photonic transformer chip: interference is all you need

doi: 10.1186/s43074-025-00182-7

1.
State Key Laboratory of Integrated Service Networks, State Key Discipline Laboratory of Wide Bandgap Semiconductor Technology, Xidian University, Xi’an 710071, China
2.
Nantong QiBo Tech

Funds: This work was supported by the National Key Research and Development Program of China (2021YFB2801900, 2021YFB2801901, 2021YFB2801902, 2021YFB2801904); National Natural Science Foundation of China (No. 61974177); National Outstanding Youth Science Fund Project of National Natural Science Foundation of China (62022062); The Fundamental Research Funds for the Central Universities (QTZX23041).

Received Date: 2025-04-10
Accepted Date: 2025-07-29
Rev Recd Date: 2025-07-19

Available Online: 2025-10-31

Abstract

Abstract

As the core component of the transformer model, the attention has been proved as all you need in artificial intelligence field in recent years. However, conventional electronic processors are unable to cope with the exponentially increasing hardware costs and energy consumption of the computing-expensive attention. While the photonic neural network (NN) chips provide alternative energy-efficient solutions for accelerating the matrix multiplication (MM), existing photonic accelerators are primarily designed for weight-static NNs that involve MM between the learned weight matrix and input tensors and thus are inefficient in supporting attention mechanisms that require dynamic input operands. Here we propose an attention mechanism relying solely on the runtime-programable optical-interference. Through theoretical analyses, numerical simulations and experimental validations, we demonstrate the photonic “all-interference” attention with learning capability equivalent to classical self-attention, and implement the photonic transformer chip (PTC). Evaluation shows that the PTC is promising to exceed 200 pera-operations per second (POPS) with 1POPS/mm² computation density and 0.5 POPS/W power efficiency, much better than prior photonic accelerators, and delivers over 200 × energy reduction and 2 to 3 orders of magnitude higher computation capability compared to the electronic counterpart. The photonic transformer with “all-interference” attention proposed in this work highlights the immense potential of photonics to construct its own computing paradigm for general purpose machine learning.
- Engineering",
- Photonic transformer')">" data-track="click" data-track-action="view keyword" data-track-label="link">Photonic transformer,
- Engineering",
- Attention mechanism')">" data-track="click" data-track-action="view keyword" data-track-label="link">Attention mechanism,
- Engineering",
- Kramers-Kronig relationship')">" data-track="click" data-track-action="view keyword" data-track-label="link">Kramers-Kronig relationship,
- Engineering",
- Optical neural network')">" data-track="click" data-track-action="view keyword" data-track-label="link">Optical neural network,
- Engineering",
- Silicon photonics')">" data-track="click" data-track-action="view keyword" data-track-label="link">Silicon photonics,
- Engineering",
- Random Fourier ')">" data-track="click" data-track-action="view keyword" data-track-label="link">Random Fourier ,
- features

FullText(HTML)

References(84)

References

[1]	Vaswani, A., et al. Attention is all you need in Proceedings of the 31st International Conference on Neural Information Processing Systems. Curran Associates Inc.: Long Beach, California, USA. 2017;6000–6010.
[2]	Dosovitskiy, A., et al. An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv. 2020;abs/2010.11929.
[3]	Mehta, S. and M. Rastegari, Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. ArXiv, 2021. abs/2110.02178.
[4]	Carion N, et al. End-to-end object detection with transformers. Cham: Springer International Publishing; 2020.
[5]	Devlin J, et al. Bert: Pre-training of deep bidirectional transformers for language understanding in North American Chapter of the Association for Computational Linguistics. 2019.
[6]	Achiam OJ, et al. Gpt-4 technical report. 2023.
[7]	Xie T, et al. Vit-mvt: A unified vision transformer network for multiple vision tasks. IEEE Transactions on Neural Networks and Learning Systems, 2023. p. 1–15.
[8]	Howard, J. and S. Ruder. Fine-tuned language models for text classification. ArXiv. 2018;abs/1801.06146.
[9]	Tripp, C.E., et al. Measuring the energy consumption and efficiency of deep neural networks: An empirical analysis and design recommendations. ArXiv. 2024;abs/2403.08151.
[10]	Berggren K, et al. Roadmap on emerging hardware and technology for machine learning. Nanotechnology. 2021;32(1):012002.
[11]	Totović AR, et al. Femtojoule per mac neuromorphic photonics: an energy and technology roadmap. IEEE J Sel Top Quantum Electron. 2020;26(5):1–15.
[12]	Shen Y, et al. Deep learning with coherent nanophotonic circuits. Nat Photon. 2017;11(7):441–6.
[13]	Tian Y, et al. Scalable and compact photonic neural chip with low learning-capability-loss. Nanophotonics. 2022;11(2):329–44.
[14]	Bandyopadhyay S, et al. Single-chip photonic deep neural network with forward-only training. Nat Photon. 2024;18(12):1335–43.
[15]	Feldmann J, et al. Parallel convolutional processing using an integrated photonic tensor core. Nature. 2021;589(7840):52–8.
[16]	Shekhar S, et al. Roadmapping the next generation of silicon photonics. Nat Commun. 2024;15(1):751.
[17]	Ramey, C. Silicon photonics for artificial intelligence acceleration : Hotchips 32 in 2020 IEEE Hot Chips 32 Symposium (HCS). 2020.
[18]	Liu J, et al. Research progress in optical neural networks: theory, applications and developments. PhotoniX. 2021;2(1):5.
[19]	Inagaki T, et al. Collective and synchronous dynamics of photonic spiking neurons. Nat Commun. 2021;12(1):2325.
[20]	Feldmann J, et al. All-optical spiking neurosynaptic networks with self-learning capabilities. Nature. 2019;569(7755):208–14.
[21]	Xiang S, et al. Semiconductor lasers for photonic neuromorphic computing and photonic spiking neural networks: a perspective. APL Photonics. 2024. https://doi.org/10.1063/5.0217968.
[22]	Xiang S, et al. Hardware-algorithm collaborative computing with photonic spiking neuron chip based on an integrated fabry–perot laser with a saturable absorber. Optica. 2023;10(2):162–71.
[23]	Xiang S, et al. Photonic integrated neuro-synaptic core for convolutional spiking neural network. Opto-Electron Adv. 2023;6(11):230140.
[24]	Marinis LD, et al. Photonic neural networks: a survey. IEEE Access. 2019;7:175827–41.
[25]	Wright LG, et al. Deep physical neural networks trained with backpropagation. Nature. 2022;601(7894):549–55.
[26]	Zhu H, et al. Dota: a dynamically-operated photonic tensor core for energy-efficient transformer accelerator. Optica Open. 2023.
[27]	Gu, J., et al. Towards area-efficient optical neural networks: An fft-based architecture in 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC). 2020.
[28]	Feng C, et al. Integrated multi-operand optical neurons for scalable and hardware-efficient deep learning. Nanophotonics. 2024;13(12):2193–206.
[29]	Zhou H, et al. Photonic matrix multiplication lights up photonic accelerator and beyond. Light Sci Appl. 2022;11(1): 30.
[30]	Afifi S, et al. Tron: Transformer neural network acceleration with non-coherent silicon photonics inProceedings of the Great Lakes Symposium on VLSI 2023. Association for Computing Machinery: Knoxville, TN, USA. 2023. p. 15–21.
[31]	Anderson MG, et al. Optical transformers. Trans. Mach. Learn. Res. 2023.2024.
[32]	Sha H, et al. Vision transformer with photonic integrated circuits. Spie/cos photonics asia. SPIE. 2024:(13236).
[33]	Dong B, et al. Partial coherence enhances parallelized photonic computing. Nature. 2024;632(8023):55–62.
[34]	Tait AN, et al. Microring weight banks. IEEE J Sel Top Quantum Electron. 2016;22(6):312–25.
[35]	Demirkıran C, et al. Mirage: An rns-based photonic accelerator for dnn training. 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA). 2023.p.73–87.
[36]	Grundmann M. Kramers–kronig relations in The physics of semiconductors: An introduction including nanophysics and applications. Springer Berlin Heidelberg: Berlin, Heidelberg. 2010.p. 775–776.
[37]	Xu X, et al. Self-calibrating programmable photonic integrated circuits. Nat Photon. 2022;16(8):595–602.
[38]	Mecozzi A, Antonelli C, Shtaif M. Kramers-kronig receivers. Adv Opt Photonics. 2019;11(3):480–517.
[39]	Zhu Z, et al. Coherent general-purpose photonic matrix processor. ACS Photonics. 2024;11(3):1189–96.
[40]	Zhang M, et al. Tempo: efficient time-multiplexed dynamic photonic tensor core for edge AI with compact slow-light electro-optic modulator. J Appl Phys. 2024. https://doi.org/10.1063/5.0203036.
[41]	Li C, et al. The challenges of modern computing and new opportunities for optics. PhotoniX. 2021;2(1):20.
[42]	Mourgias-Alexandris G, et al. Noise-resilient and high-speed deep learning with coherent silicon photonics. Nat Commun. 2022;13(1):5572.
[43]	Youngblood N. Coherent photonic crossbar arrays for large-scale matrix-matrix multiplication. IEEE J Sel Top Quantum Electron. 2023;29(2: Optical Computing):1–11.
[44]	Rahimi Kari S, et al. Realization of an integrated coherent photonic platform for scalable matrix operations. Optica. 2024;11(4):542–51.
[45]	McMahon PL. The physics of optical computing. Nat Rev Phys. 2023;5(12):717–34.
[46]	Meng X, et al. Compact optical convolution processing unit based on multimode interference. Nat Commun. 2023;14(1):3000.
[47]	Ashtiani F, Geers AJ, Aflatouni F. An on-chip photonic deep neural network for image classification. Nature. 2022;606(7914):501–6.
[48]	Xu X, et al. 11 tops photonic convolutional accelerator for optical neural networks. Nature. 2021;589(7840):44–51.
[49]	Shastri BJ, et al. Photonics for artificial intelligence and neuromorphic computing. Nat Photon. 2021;15(2):102–14.
[50]	Zhang H, et al. An optical neural chip for implementing complex-valued neural network. Nat Commun. 2021;12(1):457.
[51]	Sze V, et al. Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE. 2017;105(12):2295–329.
[52]	Tian Y, et al. Photonic neural networks with kramers–kronig activation. Adv Photon Res. 2023;4(9): 2300062.
[53]	Tang S, et al. Reconfigurable integrated photonic unitary neural networks with phase encoding enabled by in-situ training. IEEE Photon J. 2024;16(5):1–11.
[54]	Choromanski K, et al. Rethinking attention with performers. ArXiv, 2020. abs/2009.14794.
[55]	Rahimi A, Recht B. Random features for large-scale kernel machines, In Proceedings of the 21st International Conference on Neural Information Processing Systems. Curran Associates Inc.: Vancouver, British Columbia, Canada; 2007. p. 1177–1184.
[56]	Fang MYS, et al. Design of optical neural networks with component imprecisions. Opt Express. 2019;27(10):14009–29.
[57]	Cem A, et al. Thermal crosstalk modeling and compensation for programmable photonic processors. In 2023 IEEE Photonics Conference (IPC). 2023.
[58]	Milanizadeh M, et al. Control and calibration recipes for photonic integrated circuits. IEEE J Sel Top Quantum Electron. 2020;26(5):1–10.
[59]	Bai B, et al. Microcomb-based integrated photonic processing unit. Nat Commun. 2023;14(1):66.
[60]	Lightelligence. Tianshu optical-electrical hybrid computing card. 2025. Available from: https://www.xztech.ai/index.php/product/index/6.html.
[61]	Xue Z, et al. Fully forward mode training for optical neural networks. Nature. 2024;632(8024):280–6.
[62]	Zhu H, et al. Lightening-transformer: A dynamically-operated optically-interconnected photonic transformer accelerator. In 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 2024.
[63]	Su J, et al. Roformer: Enhanced transformer with rotary position embedding. ArXiv. 2021.abs/2104.09864.
[64]	Wei X, et al. Videorope: What makes for good video rotary position embedding? ArXiv. 2025.abs/2502.05173.
[65]	Bai J, et al. Qwen technical report. ArXiv. 2023.abs/2309.16609.
[66]	Zeng TGA, et al. Chatglm: A family of large language models from glm-130b to glm-4 all tools. ArXiv. 2024.abs/2406.12793.
[67]	Touvron H, et al. Llama: Open and efficient foundation language models. ArXiv. 2023.abs/2302.13971.
[68]	Jouppi NP, et al. Tpu v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings. Proceedings of the 50th Annual International Symposium on Computer Architecture. 2023.
[69]	Wang YE, Wei G.-Y, Brooks DM. Benchmarking tpu, gpu, and cpu platforms for deep learning. ArXiv. , 2019.abs/1907.10701.
[70]	Hertel IV, Schulz CP. Coherence and photons in Atoms, molecules and optical physics 2: Molecules and photons - spectroscopy and collisions. Springer Berlin Heidelberg: Berlin, Heidelberg. 2015. p. 71–134.
[71]	Ahmed SR, et al. Universal photonic artificial intelligence acceleration. Nature. 2025;640(8058):368–74.
[72]	Hua S, et al. An integrated large-scale photonic accelerator with ultralow latency. Nature. 2025;640(8058):361–7.
[73]	Fann CH, et al. Novel parallel digital optical computing system (doc) for generative a.I. In 2024 IEEE International Electron Devices Meeting (IEDM). 2024.
[74]	Han C, et al. Slow-light silicon modulator with 110-GHz bandwidth. Sci Adv. 2023;9(42): eadi5339.
[75]	Alloatti L, et al. 100 GHz silicon–organic hybrid modulator. Light Sci Appl. 2014;3(5):e173–e173.
[76]	Filipovich MJ, et al. Silicon photonic architecture for training deep neural networks with direct feedback alignment. Optica. 2022;9(12):1323–32.
[77]	Dao T, et al. Flashattention: Fast and memory-efficient exact attention with io-awareness. ArXiv. 2022.abs/2205.14135.
[78]	Kwon W, et al. Efficient memory management for large language model serving with pagedattention. Proceedings of the 29th Symposium on Operating Systems Principles. 2023.
[79]	Pope R, et al. Efficiently scaling transformer inference. ArXiv. 2022.abs/2211.05102.
[80]	Ivanov A, et al. Data movement is all you need: A case study on optimizing transformers. ArXiv. 2020.abs/2007.00072.
[81]	Shao Z, et al. Deepseek-v2: a strong, economical, and efficient mixture-of-experts language model. ArXiv. 2024. abs/2405.04434.
[82]	Kumar SK. On weight initialization in deep neural networks. ArXiv. 2017.abs/1704.08863.
[83]	He K, et al. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. IEEE International Conference on Computer Vision (ICCV). 2015;2015:1026–34.
[84]	Narkhede MV, Bartakke PP, Sutaone MS. A review on weight initialization strategies for neural networks. Artif Intell Rev. 2022;55(1):291–322.