Turn off MathJax
Article Contents
Ziwei Xu, Huan Tian, Zhen Zeng, Lingjie Zhang, Yaowen Zhang, Heping Li, Zhiyao Zhang, Yong Liu. Harnessing nonlinear optoelectronic oscillator for speeding up reinforcement learning[J]. PhotoniX. doi: 10.1186/s43074-025-00163-w
Citation: Ziwei Xu, Huan Tian, Zhen Zeng, Lingjie Zhang, Yaowen Zhang, Heping Li, Zhiyao Zhang, Yong Liu. Harnessing nonlinear optoelectronic oscillator for speeding up reinforcement learning[J]. PhotoniX. doi: 10.1186/s43074-025-00163-w

Harnessing nonlinear optoelectronic oscillator for speeding up reinforcement learning

doi: 10.1186/s43074-025-00163-w
Funds:  This work was supported by the National Natural Science Foundation of China (NSFC) (61927821, 62305046), Fundamental Research Funds for the Central Universities (ZYGX2020ZB012).
  • Received Date: 2024-06-11
  • Accepted Date: 2025-02-13
  • Rev Recd Date: 2024-11-16
  • Available Online: 2025-03-01
  • Reinforcement learning is an indispensable branch of artificial intelligence (AI), referring to the technology and methods of maximizing the rewards from an uncertain environment. As Moore’s law is coming to an end, the operation speed and the energy consumption of the advanced integrated circuits are gradually unable to meet the ever-increasing requirements of reinforcement learning. In recent years, photonic accelerator evolves as a powerful candidate to solve this issue. Here, a brand-new photonic accelerator based on a nonlinear optoelectronic oscillator (NOEO) is proposed and demonstrated to solve the multi-armed bandit (MAB) problem and simulate the Tic Tac Toe (TTT) game, both of which are the most famous reinforcement learning problems. Through adjusting the balance between the gain and the nonlinearity in the NOEO cavity, four parallel orthogonal chaotic sequences are generated with a 6-dB bandwidth up to 18.18 GHz and a permutation entropy (PE) as high as 0.9983. With assistance of tug-of-war and time differential methods, a 512-armed bandit problem and an intelligent TTT game are successfully accelerated, respectively. This work presents an innovative photonic accelerator for solving reinforcement learning problems more efficiently. Apart from reinforcement learning, the proposed scheme can find applications in other fields of AI, such as reservoir computing and neural networks. Reinforcement learning is an indispensable branch of artificial intelligence (AI), referring to the technology and methods of maximizing the rewards from an uncertain environment. As Moore’s law is coming to an end, the operation speed and the energy consumption of the advanced integrated circuits are gradually unable to meet the ever-increasing requirements of reinforcement learning. In recent years, photonic accelerator evolves as a powerful candidate to solve this issue. Here, a brand-new photonic accelerator based on a nonlinear optoelectronic oscillator (NOEO) is proposed and demonstrated to solve the multi-armed bandit (MAB) problem and simulate the Tic Tac Toe (TTT) game, both of which are the most famous reinforcement learning problems. Through adjusting the balance between the gain and the nonlinearity in the NOEO cavity, four parallel orthogonal chaotic sequences are generated with a 6-dB bandwidth up to 18.18 GHz and a permutation entropy (PE) as high as 0.9983. With assistance of tug-of-war and time differential methods, a 512-armed bandit problem and an intelligent TTT game are successfully accelerated, respectively. This work presents an innovative photonic accelerator for solving reinforcement learning problems more efficiently. Apart from reinforcement learning, the proposed scheme can find applications in other fields of AI, such as reservoir computing and neural networks.
  • loading
  • [1]
    Dennard R. Evolution of the MOSFET dynamic RAM—A personal view. IEEE Trans Electron Devices. 1984;31(11):1549–55.
    [2]
    Dennard RH, et al. Design of ion-implanted MOSFET’s with very small physical dimensions. IEEE J Solid-State Circuits. 1974;9(5):256–68.
    [3]
    Hill MD, Marty MR. Amdahl’s law in the multicore era. Computer. 2008;41(7):33–8.
    [4]
    Li YZ, Tian L. Computer-free computational imaging: optical computing for seeing through random media. Light-Sci Appl. 2022;11(1):37.
    [5]
    Hu JT, et al. Diffractive optical computing in free space. Nat Commun. 2024;15(1):1525.
    [6]
    Xu ZH, Yuan XY, Zhou TK, Fang L. A multichannel optical computing architecture for advanced machine vision. Light-Sci Appl. 2022;11(1):255.
    [7]
    Zhu HH, et al. Space-efficient optical computing with an integrated chip diffractive neural network. Nat Commun. 2022;13(1):1044.
    [8]
    Meng XY, et al. Compact optical convolution processing unit based on multimode interference. Nat Commun. 2023;14(1):3000.
    [9]
    Liu WL, et al. A fully reconfigurable photonic integrated signal processor. Nat Photonics. 2016;10(3):190–5.
    [10]
    Wu TW, Menarini M, Gao ZH, Feng L. Lithography-free reconfigurable integrated photonic processor. Nat Photonics. 2023;17(8):710–6.
    [11]
    Kitayama K, et al. Novel frontier of photonics for data processing—Photonic accelerator. APL Photonics. 2019;4(9):090901.
    [12]
    Wu CM, et al. Programmable phase-change metasurfaces on waveguides for multimode photonic convolutional neural network. Nat Commun. 2021;12(1):96.
    [13]
    Zhou HL, et al. Photonic matrix multiplication lights up photonic accelerator and beyond. Light-Sci Appl. 2022;11(1):30.
    [14]
    Fishman T, et al. Imaging the field inside nanophotonic accelerators. Nat Commun. 2023;14(1):3687.
    [15]
    Wu CM, et al. Harnessing optoelectronic noises in a photonic generative network. Sci Adv. 2022;8(3):2956.
    [16]
    McNeur J, et al. Elements of a dielectric laser accelerator. Optica. 2018;5(6):687–90.
    [17]
    Shiloh R, et al. Electron phase-space control in photonic chip-based particle acceleration. Nature. 2021;597(7877):498–502.
    [18]
    Shen YC, et al. Deep learning with coherent nanophotonic circuits. Nat Photonics. 2017;11(7):441–6.
    [19]
    Mourgias-Alexandris G, et al. Noise-resilient and high-speed deep learning with coherent silicon photonics. Nat Commun. 2022;13(1):5572.
    [20]
    Ma W, et al. Deep learning for the design of photonic structures. Nat Photonics. 2021;15(2):77–90.
    [21]
    Jiang JQ, Chen MK, Fan JA. Deep neural networks for the evaluation and design of photonic devices. Nat Rev Mater. 2021;6(8):679–700.
    [22]
    Marandi A, Wang Z, Takata K, Byer RL, Yamamoto Y. Network of time-multiplexed optical parametric oscillators as a coherent Ising machine. Nat Photonics. 2014;8(12):937–42.
    [23]
    McMahon PL, et al. A fully programmable 100-spin coherent Ising machine with all-to-all connections. Science. 2016;354(6312):614–7.
    [24]
    Inagaki T, et al. A coherent Ising machine for 2000-node optimization problems. Science. 2016;354(6312):603–6.
    [25]
    Cen QZ, et al. Large-scale coherent Ising machine based on optoelectronic parametric oscillator. Light-Sci Appl. 2022;11(1):333.
    [26]
    Bybee C, et al. Efficient optimization with higher-order Ising machines. Nat Commun. 2023;14(1):6033.
    [27]
    Van Der Sande G, et al. Advances in photonic reservoir computing. Nanophotonics. 2017;6(3):561–76.
    [28]
    Tanaka G, et al. Recent advances in physical reservoir computing: A review. Neural Netw. 2019;115:100–23.
    [29]
    Liang XP, et al. Rotating neurons for all-analog implementation of cyclic reservoir computing. Nat Commun. 2022;13(1):1549.
    [30]
    Hasegawa H, Kanno K, Uchida A. Parallel and deep reservoir computing using semiconductor lasers with optical feedback. Nanophotonics. 2023;12(5):869–81.
    [31]
    Xia W, et al. Configured quantum reservoir computing for multi-task machine learning. Sci Bull. 2023;68(20):2321–9.
    [32]
    Witt D, Young J, Chrostowski L. Reinforcement learning for photonic component design. APL Photonics. 2023;8(10):106101.
    [33]
    Bueno J, et al. Reinforcement learning in a large-scale photonic recurrent neural network. Optica. 2018;5(6):756–60.
    [34]
    Do N, Truong D, Nguyen D, Hoai M, Pham C. Self-controlling photonic-on-chip networks with deep reinforcement learning. Sci Rep. 2021;11(1):23151.
    [35]
    Iwami R, et al. Controlling chaotic itinerancy in laser dynamics for reinforcement learning. Sci Adv. 2022;8(49):8325.
    [36]
    Matsuo Y, et al. Deep learning, reinforcement learning, and world models. Neural Netw. 2022;152:267–75.
    [37]
    Yu C, Liu JM, Nemati SM, Yin GS. Reinforcement learning in healthcare: A survey. ACM Comput Surv. 2023;55(1):5.
    [38]
    Kiran BR, et al. Deep reinforcement learning for autonomous driving: A survey. IEEE Trans Intell Transp Syst. 2022;23(6):4909–26.
    [39]
    Kober J, Bagnell JA, Peters J. Reinforcement learning in robotics: A survey. Int J Robot Res. 2013;32(11):1238–74.
    [40]
    Preil D, Krapp M. Bandit-based inventory optimisation: Reinforcement learning in multi-echelon supply chains. Int J Prod Econ. 2022;252:108578.
    [41]
    Morijiri K, et al. Parallel photonic accelerator for decision making using optical spatiotemporal chaos. Optica. 2023;10(3):339–48.
    [42]
    Morijiri K, Mihana T, Kanno K, Naruse M, Uchida A. Decision making for large-scale multi-armed bandit problems using bias control of chaotic temporal waveforms in semiconductor lasers. Sci Rep. 2022;12(1):8073.
    [43]
    Mihana T, et al. Decision making for the multi-armed bandit problem using lag synchronization of chaos in mutually coupled semiconductor lasers. Opt Express. 2019;27(19):26989–7008.
    [44]
    Shen BT, et al. Harnessing microcomb-based parallel chaos for random number generation and optical decision making. Nat Commun. 2023;14(1):4590.
    [45]
    Tian H, et al. Externally-triggered spiking and spiking cluster oscillation in broadband optoelectronic oscillator. J Lightwave Technol. 2023;41(1):48–58.
    [46]
    Bandt C, Pompe B. Permutation entropy: A natural complexity measure for time series. Phys Rev Lett. 2002;88(17):174102.
    [47]
    Marangon DG, et al. A fast and robust quantum random number generator with a self-contained integrated photonic randomness core. Nat Electron. 2024;7(5):396–404.
    [48]
    Tanizawa K, Kato K, Futami F. Real-time 50-Gbit/s spatially multiplexed quantum random number generator based on vacuum fluctuation. J Lightwave Technol. 2024;42(4):1209–14.
    [49]
    Keshavarzian P, et al. A 3.3-Gb/s SPAD-based quantum random number generator. IEEE J Solid-State Circuit. 2023;58(9):2632–47.
    [50]
    Chen CY, Li SH, Song CK. A 200 kb/s 36 µw true random number generator based on dual oscillators for IOT security application. Electronics. 2023;12(10):2332.
    [51]
    Argyris A, et al. Chaos-based communications at high bit rates using commercial fiber-optic links. Nature. 2005;438(7066):343–6.
    [52]
    Chembo YK, Brunner D, Jacquot M, Larger L. Optoelectronic oscillators with time-delayed feedback. Rev Mod Phys. 2019;91(3):035006.
    [53]
    Hao TF, et al. Towards monolithic integration of OEOs: From systems to chips. J Lightwave Technol. 2018;36(19):4565–82.
    [54]
    Zhang GJ, et al. Hybrid-integrated wideband tunable optoelectronic oscillator. Opt Express. 2023;31(10):16929–38.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (6) PDF downloads(0) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return