From compressive sampling to compressive tasking: retrieving semantics in compressed domain with low bandwidth

1.
Department of Automation, Tsinghua University, 100084, Beijing, China
2.
Research Center for Industries of the Future and School of Engineering, Westlake University, 310024, Hangzhou, China
3.
Key Laboratory of 3D Micro/Nano Fabrication and Characterization of Zhejiang Province, Westlake University, 310024, Hangzhou, China
4.
Computer Network Information Center, Chinese Academy of Sciences, 100190, Beijing, China
5.
University of Chinese Academy of Sciences, 100049, Beijing, China
6.
Wyant College of Optical Sciences, University of Arizona, AZ 85721, Tucson, USA

Funds: X. Yuan would like to thank the Research Center for Industries of the Future (RCIF) at Westlake University for supporting this work and the funding from Lochn Optics.

摘要

- /
- /
- /
Abstract: High-throughput imaging is highly desirable in intelligent analysis of computer vision tasks. In conventional design, throughput is limited by the separation between physical image capture and digital post processing. Computational imaging increases throughput by mixing analog and digital processing through the image capture pipeline. Yet, recent advances of computational imaging focus on the “compressive sampling”, this precludes the wide applications in practical tasks. This paper presents a systematic analysis of the next step for computational imaging built on snapshot compressive imaging (SCI) and semantic computer vision (SCV) tasks, which have independently emerged over the past decade as basic computational imaging platforms.SCI is a physical layer process that maximizes information capacity per sample while minimizing system size, power and cost. SCV is an abstraction layer process that analyzes image data as objects and features, rather than simple pixel maps. In current practice, SCI and SCV are independent and sequential. This concatenated pipeline results in the following problems: i) a large amount of resources are spent on task-irrelevant computation and transmission, ii) the sampling and design efficiency of SCI is attenuated, and iii) the final performance of SCV is limited by the reconstruction errors of SCI. Bearing these concerns in mind, this paper takes one step further aiming to bridge the gap between SCI and SCV to take full advantage of both approaches.After reviewing the current status of SCI, we propose a novel joint framework by conducting SCV on raw measurements captured by SCI to select the region of interest, and then perform reconstruction on these regions to speed up processing time. We use our recently built SCI prototype to verify the framework. Preliminary results are presented and the prospects for a joint SCI and SCV regime are discussed. By conducting computer vision tasks in the compressed domain, we envision that a new era of snapshot compressive imaging with limited end-to-end bandwidth is coming.
- Snapshot compressive imaging /
- Semantic computer vision /
- Computational imaging /
- Deep learning

HTML全文

参考文献(78)

[1]	Boyle WS, Smith GE. Charge coupled semiconductor devices. Bell Syst Tech J. 1970;49(4):587–93.
[2]	Altmann Y, McLaughlin S, Padgett MJ, Goyal VK, Hero AO, Faccio D. Quantum-inspired computational imaging. Science. 2018;361(6403):eaat2298. https://doi.org/10.1126/science.aat2298.
[3]	Mait JN, Euliss GW, Athale RA. Computational imaging. Adv Opt Photonics. 2018;10(2):409–83.
[4]	Yuan X, Brady DJ, Katsaggelos AK. Snapshot compressive imaging: theory, algorithms, and applications. IEEE Signal Process Mag. 2021;38(2):65–88.
[5]	Gao L, Liang J, Li C, Wang LV. Single-shot compressed ultrafast photography at one hundred billion frames per second. Nature. 2014;516(7529):74–7.
[6]	Raskar R, Agrawal A, Tumblin J. Coded exposure photography: motion deblurring using fluttered shutter. ACM Trans Graphics. 2006;25(3):795–804.
[7]	Sitzmann V, Diamond S, Peng Y, Dun X, Boyd S, Heidrich W, et al. End-to-end optimization of optics and image processing for achromatic extended depth of field and super-resolution imaging. ACM Trans Graphics. 2018;37(4):1–13.
[8]	Sun Q, Zhang J, Dun X, Ghanem B, Peng Y, Heidrich W. End-to-end learned, optically coded super-resolution SPAD camera. ACM Trans Graph. 2020;39(2):1–14.
[9]	Antipa N, Oare P, Bostan E, Ng R, Waller L. Video from stills: lensless imaging with rolling shutter. In: 2019 IEEE International Conference on Computational Photography (ICCP). IEEE; 2019. p. 1-8.
[10]	Asif MS, Ayremlou A, Sankaranarayanan A, Veeraraghavan A, Baraniuk RG. FlatCam: thin, lensless cameras using coded aperture and computation. IEEE Trans Comput Imaging. 2017;3(3):384–97.
[11]	Cai Z, Chen J, Pedrini G, Osten W, Liu X, Peng X. Lensless light-field imaging through diffuser encoding. Light Sci Appl. 2020;9(1):143.
[12]	Hu C, Huang H, Chen M, Yang S, Chen H. FourierCam: a camera for video spectrum acquisition in a single shot. Photon Res. 2021;9(5):701.
[13]	Liang CK, Lin TH, Wong BY, Liu C, Chen HH. Programmable aperture photography: multiplexed light field acquisition. ACM Trans Graph. 2008;27(3):391–400.
[14]	Lv X, Li Y, Zhu S, Guo X, Zhang J, Lin J, et al. Snapshot spectral polarimetric light field imaging using a single detector. Opt Lett. 2020;45(23):6522.
[15]	Hu C, Huang H, Chen M, Yang S, Chen H. Video object detection from one single image through opto-electronic neural network. APL Photon. 2021;6(4):046104.
[16]	Okawara T, Yoshida M, Nagahara H, Yagi Y. Action recognition from a single coded image. In: 2020 IEEE International Conference on Computational Photography (ICCP). IEEE; 2020. p. 1-11.
[17]	Wu Y, Boominathan V, Chen H, Sankaranarayanan A, Veeraraghavan A. PhaseCam3D — learning phase masks for passive single view depth estimation. In: 2019 IEEE International Conference on Computational Photography (ICCP). IEEE; 2019. p. 1-12.
[18]	Audebert N, Le Saux B, Lefevre S. Deep learning for classification of hyperspectral data: a comparative review. IEEE Geosci Remote Sens Mag. 2019;7(2):159–73.
[19]	Rawat W, Wang Z. Deep convolutional neural networks for image classification: a comprehensive review. Neural Comput. 2017;29(9):2352–449.
[20]	Asgari Taghanaki S, Abhishek K, Cohen JP, Cohen-Adad J, Hamarneh G. Deep semantic segmentation of natural and medical images: a review. Artif Intell Rev. 2021;54(1):137–78.
[21]	Yu H, Yang Z, Tan L, Wang Y, Sun W, Sun M, et al. Methods and datasets on semantic segmentation: a review. Neurocomputing. 2018;304:82–103.
[22]	Jiao L, Wang D, Bai Y, Chen P, Liu F. Deep learning in visual tracking: a review. IEEE Trans Neural Netw Learn Syst. 2021;1(1):1–20.
[23]	Pal SK, Pramanik A, Maiti J, Mitra P. Deep learning in multi-object detection and tracking: state of the art. Appl Intell. 2021;51(9):6400–29.
[24]	Zhu H, Wei H, Li B, Yuan X, Kehtarnavaz N. A review of video object detection: datasets, metrics and methods. Appl Sci. 2020;10(21):7834.
[25]	Aafaq N, Mian A, Liu W, Gilani SZ, Shah M. Video description: a survey of methods, datasets, and evaluation metrics. ACM Comput Surv. 2020;52(6):1–37.
[26]	Hossain MZ, Sohel F, Shiratuddin MF, Laga H. A comprehensive survey of deep learning for image captioning. ACM Comput Surv. 2019;51(6):1–36.
[27]	Guo Y, Liu Y, Georgiou T, Lew MS. A review of semantic segmentation using deep neural networks. Int J Multimed Info Retr. 2018;7(2):87–93.
[28]	Herath S, Harandi M, Porikli F. Going deeper into action recognition: a survey. Image Vision Comput. 2017;60:4–21.
[29]	Li S, Deng W. Deep facial expression recognition: a survey. IEEE Trans Affect Comput. 2020;1(1):1–10.
[30]	Pawar PG, Devendran V. Scene understanding: a survey to see the world at a single glance. In: 2019 2nd International Conference on Intelligent Communication and Computational Techniques (ICCT). IEEE; 2019. p. 182-6.
[31]	Chen S, Yao T, Jiang YG. Deep learning for video captioning: a review. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI). International Joint Conferences on Artificial Intelligence Organization; 2019. p. 6283-90.
[32]	Deng C, Zhang Y, Mao Y, Fan J, Suo J, Zhang Z, et al. Sinusoidal sampling enhanced compressive camera for high speed imaging. IEEE Trans Pattern Anal Mach Intell. 2021;43(4):1380–93.
[33]	Hitomi Y, Gu J, Gupta M, Mitsunaga T, Nayar SK. Video from a single coded exposure photograph using a learned over-complete dictionary. In: 2011 International Conference on Computer Vision (ICCV). IEEE; 2011. p. 287-94.
[34]	Llull P, Liao X, Yuan X, Yang J, Kittle D, Carin L, et al. Coded aperture compressive temporal imaging. Opt Express. 2013;21(9):10526.
[35]	Lu R, Chen B, Liu G, Cheng Z, Qiao M, Yuan X. Dual-view snapshot compressive imaging via optical flow aided recurrent neural network. Int J Comput Vision. 2021;129(12):3279–98.
[36]	Qiao M, Liu X, Yuan X. Snapshot spatial-temporal compressive imaging. Opt Lett. 2020;45(7):1659–62.
[37]	Qiao M, Meng Z, Ma J, Yuan X. Deep learning for video compressive sensing. APL Photonics. 2020;5(3):030801.
[38]	Reddy D, Veeraraghavan A, Chellappa R. P2C2: programmable pixel compressive camera for high speed imaging. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2011. p. 329-36.
[39]	Shedligeri P, S A, Mitra K. A unified framework for compressive video recovery from coded exposure techniques. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE; 2021. p. 1600-9.
[40]	Yoshida M, Sonoda T, Nagahara H, Endo K, Sugiyama Y, Taniguchi RI. High-speed imaging using CMOS image sensor with quasi pixel-wise exposure. IEEE Trans Comput Imaging. 2020;6:463–76.
[41]	Yuan X, Llull P, Liao X, Yang J, Brady DJ, Sapiro G, et al. Low-cost compressive sensing for color video and depth. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2014. p. 3318-25.
[42]	Zhang Z, Deng C, Liu Y, Yuan X, Suo J, Dai Q. Ten-mega-pixel snapshot compressive imaging with a hybrid coded aperture. Photonics Res. 2021;9(11):2277.
[43]	Wei M, Sarhangnejad N, Xia Z, Gusev N, Katic N, Genov R, et al. Coded two-bucket cameras for computer vision. In: European Conference on Computer Vision (ECCV). Springer; 2018. p. 54-71.
[44]	Wang P, Liang J, Wang LV. Single-shot ultrafast imaging attaining 70 trillion frames per second. Nat Commun. 2020;11(1):2091.
[45]	Liu Y, Yuan X, Suo J, Brady DJ, Dai Q. Rank minimization for snapshot compressive imaging. IEEE Trans Pattern Anal Mach Intell. 2019;41(12):2990–3006.
[46]	Yuan X. Generalized alternating projection based total variation minimization for compressive sensing. In: 2016 IEEE International Conference on Image Processing (ICIP). IEEE; 2016. p. 2539-43.
[47]	Yuan X, Liu Y, Suo J, Dai Q. Plug-and-play algorithms for large-scale snapshot compressive imaging. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2020. p. 1444-54.
[48]	Jalali S, Yuan X. Snapshot compressed sensing: performance bounds and algorithms. IEEE Trans Inf Theory. 2019;65(12):8005–24.
[49]	Jalali S, Yuan X, Compressive imaging via one-shot measurements. In: 2018 IEEE International Symposium on Information Theory (ISIT). IEEE; 2018. p. 416–20.
[50]	Bioucas-Dias JM, Figueiredo MAT. A new TwIST: two-step iterative shrinkage/thresholding algorithms for image restoration. IEEE Trans Image Process. 2007;16(12):2992–3004.
[51]	Cheng Z, Lu R, Wang Z, Zhang H, Chen B, Meng Z, et al. BIRNAT: bidirectional recurrent neural networks with adversarial training for video snapshot compressive imaging. In: European Conference on Computer Vision (ECCV). Springer; 2020. p. 258-75.
[52]	Iliadis M, Spinoulas L, Katsaggelos AK. Deep fully-connected networks for video compressive sensing. Digit Signal Process. 2018;72:9–18.
[53]	Ma J, Liu XY, Shou Z, Yuan X. Deep tensor ADMM-Net for snapshot compressive imaging. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE; 2019. p. 10222-31.
[54]	Wang Z, Zhang H, Cheng Z, Chen B, Yuan X. MetaSCI: scalable and adaptive reconstruction for video compressive sensing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2021. p. 2083-92.
[55]	Wu Z, Zhang J, Mou C. Dense deep unfolding network with 3D-CNN prior for snapshot compressive imaging. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE; 2021. p. 4892-901.
[56]	Yang J, Liao X, Yuan X, Llull P, Brady DJ, Sapiro G, et al. Compressive sensing by learning a Gaussian mixture model from measurements. IEEE Trans Image Process. 2015;24(1):106–19.
[57]	Yang J, Yuan X, Liao X, Llull P, Brady DJ, Sapiro G, et al. Video compressive sensing using Gaussian mixture models. IEEE Trans Image Process. 2014;23(11):4863–78.
[58]	Cheng Z, Chen B, Liu G, Zhang H, Lu R, Wang Z, et al. Memory-efficient network for large-scale video compressive sensing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2021. p. 16246-55.
[59]	Yuan X, Liu Y, Suo J, Durand F, Dai Q. Plug-and-play algorithms for video snapshot compressive imaging. IEEE Trans Pattern Anal Mach Intell. 2021;1(1):1–18.
[60]	Liao X, Li H, Carin L. Generalized alternating projection for weighted-$ \ell _{2,1} $ minimization with applications to model-based compressive sensing. SIAM J Imaging Sci. 2014;7(2):797–823.
[61]	Boyd S. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends® Mach Learn. 2010;3(1):1–122.
[62]	Bethi YRT, Narayanan S, Rangan V, Chakraborty A, Thakur CS. Real-time object detection and localization in compressive sensed video. In: 2021 IEEE International Conference on Image Processing (ICIP). IEEE; 2021. p. 1489-93.
[63]	Kwan C, Chou B, Yang J, Rangamani A, Tran T, Zhang J, et al. Target tracking and classification using compressive measurements of MWIR and LWIR coded aperture cameras. J Signal Inf Process. 2019;10(03):73–95.
[64]	Lu S, Yuan X, Shi W, Edge compression: an integrated framework for compressive imaging processing on CAVs. In: 2020 IEEE/ACM Symposium on Edge Computing (SEC). IEEE; 2020. p. 125–38.
[65]	Kwan C, Chou B, Yang J, Rangamani A, Tran T, Zhang J, et al. Deep learning-based target tracking and classification for low quality videos using coded aperture cameras. Ah S Sens. 2019;19(17):3702.
[66]	Kwan C, Chou B, Yang J, Rangamani A, Tran T, Zhang J, et al. Target tracking and classification using compressive sensing camera for SWIR videos. Signal Image Video Process. 2019;13(8):1629–37.
[67]	Rezaei M, Terauchi M, Klette R. Robust vehicle detection and distance estimation under challenging lighting conditions. IEEE Trans Intell Transp Syst. 2015;16(5):2723–43.
[68]	Zhe T, Huang L, Wu Q, Zhang J, Pei C, Li L. Inter-vehicle distance estimation method based on monocular vision using 3D detection. IEEE Trans Veh Technol. 2020;69(5):4907–19.
[69]	Yuan X, Yang J, Llull P, Liao X, Sapiro G, Brady DJ, et al. Adaptive temporal compressive sensing for video. In: 2013 IEEE International Conference on Image Processing (ICIP). IEEE; 2013. p. 14-8.
[70]	Zheng S, Wang C, Yuan X, Xin HL. Super-compression of large electron microscopy time series by deep compressive sensing learning. Patterns. 2021;2(7):100292.
[71]	Zheng S, Yang X, Yuan X. Two-stage is enough: a concise deep unfolding reconstruction network for flexible video compressive sensing. arXiv preprint arXiv:2201.05810. 2022;1(1):1-10.
[72]	Gomez AN, Ren M, Urtasun R, Grosse RB. The reversible residual network: backpropagation without storing activations. In: Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS). vol. 30. Curran Associates, Inc.; 2017. p. 1-10.
[73]	Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A. Places: a 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell. 2018;40(6):1452–64.
[74]	Zhou X, Koltun V, Krähenbühl P. Tracking objects as points. In: European Conference on Computer Vision (ECCV). Springer; 2020. p. 474-90.
[75]	Caesar H, Bankiti V, Lang AH, Vora S, Liong VE, Xu Q, et al. nuScenes: a multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2020. p. 11618-28.
[76]	Hu W, Tan T, Wang L, Maybank S. A survey on visual surveillance of object motion and behaviors. IEEE Syst Man Cybern Mag. 2004;34(3):334–52.
[77]	Zhao ZQ, Zheng P, Xu St, Wu X. Object detection with deep learning: a review. IEEE Trans Neural Netw Learn Syst. 2019;30(11):3212-32.
[78]	Feng D, Haase-Schütz C, Rosenbaum L, Hertlein H, Glaeser C, Timm F, et al. Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and challenges. IEEE Trans Intell Transp Syst. 2020;22(3):1341–60.

施引文献

资源附件(0)

点击查看大图

计量

文章访问数: 174
HTML全文浏览量: 0
PDF下载量: 6
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

doi: 10.1186/s43074-022-00065-1

计量

From compressive sampling to compressive tasking: retrieving semantics in compressed domain with low bandwidth

计量

目录

PhotoniX

联系我们

友情链接

留言板

doi: 10.1186/s43074-022-00065-1

计量

出版历程

From compressive sampling to compressive tasking: retrieving semantics in compressed domain with low bandwidth

计量

出版历程

目录

PhotoniX

联系我们

友情链接