Multi-modal Deep Learning and Its Applications in Ophthalmic Artificial Intelligence
-
摘要: 深度学习的强学习能力和高易用性使其成为当前主流机器学习算法和医学人工智能的核心技术。鉴于医学影像在健康筛查、疾病诊断、精准治疗、预后评估等诸多任务中的关键作用,用于医学影像结构分析与语义理解的深度学习正成为重要的交叉学科研究方向。在临床场景中,医生为了实现更精准的诊断,往往需要同时参考不同类型、不同模态的影像样本进行综合分析和判断。本文介绍面向此类场景的多模态深度学习的基本概念和工作原理,结合具体案例分析多模态深度学习在眼科领域的研究进展、应用情况及技术挑战,并对该技术的应用前景作出展望。Abstract: Deep learning, for its powerful learning capability and high usability, has been a prevalent algorithm of machine learning and a core technique for artificial intelligence(AI) in medicine and healthcare. Due to the importance of medical imaging in many tasks such as health screening, disease diagnosis, precise treatment, and prognosis prediction, deep learning of structural analysis and semantic understanding for medical images is becoming an important interdisciplinary research direction. In clinical scenarios, in order to achieve a more accurate diagnosis, doctors need to simultaneously refer to multiple modalities of medical imaging for a comprehensive analysis and judgment. This article introduced the basic concepts and working principles of multimodal deep learning in such scenarios, reviewed recent research progress on applying multi-modal deep learning in both generic medical fields and ophthalmology, and discussed technical challenges and also envision potential applications of multi-modal deep learning in AI-assisted ophthalmology.
-
Key words:
- multi-modal deep learning /
- ophthalmology /
- artificial intelligence /
- assisted diagnosis
-
表 1 单模态深度学习在眼科领域的应用举例
年份(年) 研究者 任务 单模态输入 2016 Gulshan等[7] DR转诊/非转诊分类 单张眼底彩照 2017 Burlina等[8] AMD分级 单张眼底彩照 2018 Kermany等[9] 多病种识别 OCT图像序列 2018 Wei等[10] 激光斑检测 单张眼底彩照 2019 Lai等[11] 左右眼识别 单张眼底彩照 2019 Xu等[12] 核性白内障分级 单张裂隙灯照片 2019 Yang等[13] 视盘-黄斑联合定位 单张超广角眼底图像 2020 Wu等[14] 异常检测 单张OCT B-scan图像 2020 Ding等[15] 视盘/视杯分割 单张眼底彩照 2020 Ding等[16] RNFLD检测 单张眼底彩照 2020 Wei等[17] 眼底病灶分割, DR分级 单张眼底彩照 2020 Li等[18] ROP检测 多张眼底彩照 2021 Li等[19] 多病种识别 单张眼底彩照 2021 Zhang等[20] 多病种识别 单张超广角眼底图像 DR:糖尿病视网膜病变;AMD:年龄相关性黄斑变性;OCT:光学相干断层成像;RNFLD:视神经纤维层缺损;ROP:早产儿视网膜病变 表 2 多模态深度学习在医学领域的应用举例
-
[1] Etzioni O, Decario N. AI can help scientists find a COVID-19 vaccine[EB/OL ]. [2021-06-16]. https://www.wired.com/story/opinion-ai-can-help-find-scientists-find-a-covid-19-vaccine. [2] Laguarta J, Hueto F, Subirana B. COVID-19 artificial intelligence diagnosis using only cough recordings[J]. IEEE Open J Eng Med Biol, 2020, 1: 275-281. doi: 10.1109/OJEMB.2020.3026928 [3] Zeeberg A. D.I.Y. Artificial intelligence comes to a Japanese family farm[EB/OL ]. [2021-06-16]. https://www.newyorker.com/tech/annals-of-technology/diy-artificial-intelligence-comes-to-a-japanese-family-farm. [4] Bengio Y. Learning deep architectures for AI[G]. Foundations and Trends® in Machine Learning, 2009, 2: 1-127. [5] Schmidhuber J. Deep learning in neural networks: An overview[J]. Neural Netw, 2015, 61: 85-117. http://www.onacademic.com/detail/journal_1000036789998910_729a.html [6] Zheng A, Casari A. Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists[M]. New York: O'Reilly Media Inc., 2018. [7] Gulshan V, Peng L, Coram M, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs[J]. JAMA, 2016, 316: 2402-2410. doi: 10.1001/jama.2016.17216 [8] Burlina PM, Joshi N, Pekala M, et al. Automated grading of age-related macular degeneration from color fundus images using deep convolutional neural networks[J]. JAMA Ophthalmol, 2017, 135: 1170-1176. doi: 10.1001/jamaophthalmol.2017.3782 [9] Kermany DS, Goldbaum M, Cai W, et al. Identifying medical diagnoses and treatable diseases by image-based deep learning[J]. Cell, 2018, 172: 1122-1131. e9. doi: 10.1016/j.cell.2018.02.010 [10] Wei Q, Li X, Wang H, et al. Laser scar detection in fundus images using convolutional neural network[C]. ACCV, 2018: 191-206. [11] Lai X, Li X, Qian R, et al. Four models for automatic recognition of left and right eye in fundus images[C]. MMM, 2019: 507-517. [12] Xu C, Zhu X, He W, et al. Fully deep learning for slit-lamp photo based nuclear cataract grading[C]. MICCAI, 2019: 513-521. [13] Yang Z, Li X, He X, et al. Joint localization of optic disc and fovea in ultra-widefield fundus images[C]. MLMI, 2019: 453-460. [14] Wu J, Zhang Y, Wang J, et al. AttenNet: Deep attention based retinal disease classification in OCT images[C]. MMM, 2020: 565-576. [15] Ding F, Yang G, Wu J, et al. High-order attention networks for medical image segmentation[C]. MICCAI, 2020: 253-262. [16] Ding F, Yang G, Ding D, et al. Retinal nerve fiber layer defect detection with position guidance[C]. MICCAI, 2020: 745-754. [17] Wei Q, Li X, Yu W, et al. Learn to segment retinal lesions and beyond[C]. ICPR, 2020: 7403-7410. [18] Li X, Wan W, Y. Zhou, et al. Deep multiple instance learning with spatial attention for ROP case classification, instance selection and abnormality localization[C]. ICPR, 2020: 7293-7298. [19] Li B, Chen H, Zhang B, et al. Development and evaluation of a deep learning model for the detection of multiple fundus diseases based on colour fundus photography[J]. Br J Ophthalmol, 2021. doi: 10.1136/bjophthalmol-2020-316290. [20] Zhang C, He F, Li B, et al. Development of a deep-learning system for detection of lattice degeneration, retinal breaks, and retinal detachment in tessellated eyes using ultra-wide-field fundus images: A pilot study[J]. Graefes Arch Clin Exp Ophthalmol, 2021, 259: 2225-2234. doi: 10.1007/s00417-021-05105-3 [21] Zhang C, Yang Z, He X, et al. Multimodal intelligence: Representation learning, information fusion, and applications[J]. IEEE J Sel Top Signal Process, 2020, 14: 478-493. doi: 10.1109/JSTSP.2020.2987728 [22] Baltrušaitis T, Ahuja C, Morency LP. Multimodal machine learning: A survey and taxonomy[J]. IEEE Trans Pattern Anal Mach Intell, 2018, 41: 423-443. http://arxiv.org/pdf/1705.09406 [23] Wang J, Tian K, Ding D, et al. Unsupervised domain expansion for visual categorization[J]. ACM Trans Multimedia Comput Commun Appl, 2021. https://arxiv.org/abs/2104.00233. https://arxiv.org/abs/2104.00233 [24] Li X, Jia M, Islam M T, et al. Self-supervised feature learning via exploiting multi-modal data for retinal disease diag-nosis[J]. IEEE Trans Med Imaging, 2020, 39: 4023-4033. doi: 10.1109/TMI.2020.3008871 [25] Wang W, Xu Z, Yu W, et al. Two-stream CNN with loose pair training for multi-modal AMD categorization[C]. MICCAI, 2019: 156-164. [26] Chen RJ, Lu MY, Wang J, et al. Pathomic fusion: an integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis[J]. IEEE Trans Med Imaging, 2020. doi: 10.1109/TMI.2020.3021387. [27] Yang J, Yang Z, Mao Z, et al. Bi-modal deep learning for recognizing multiple retinal diseases based on color fundus photos and OCT images[C]. ARVO Annual Meeting, 2021. [28] Wang J, Miao J, Yang X, et al. Auto-weighting for breast cancer classification in multi- modal ultrasound[C]. MICCAI, 2020: 190-199. [29] Zhou T, Fu H, Zhang Y, et al. M2Net: Multi-modal multi-channel network for overall survial time prediction of brain tumor patients[C]. MICCAI, 2020: 221-231. [30] Jiang X, Luo Q, Wang Z, et al. Multiphase and multi-level selective feature fusion for automated pancreas segment from CT images[C]. MICCAI, 2020: 460-469. [31] Peng Y, Bi L, Fulham M, et al. Multi-modality information fusion for radiomics-based neural architecture search[C]. MICCAI, 2020: 763-771. [32] Wang W, Xu Z, Yu W, et al. Two-stream CNN with loose pair training for multi-modal AMD categorization[C]. MICCAI, 2019: 156-164. [33] Xu Z, Wang W, Yang J, et al. Automated diagnoses of age-related macular degeneration and polypoidal choroidal vasculopathy using bi-modal deep convolutional neural networks[J]. Br J Ophthalmol, 2021, 105: 561-566. doi: 10.1136/bjophthalmol-2020-315817 [34] Zhu J, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]. ICCV, 2017: 2223-2232. [35] Li X, Zhou Y, Wang J, et al. Multi-modal multi-instance learning for retinal disease recognition[C]. ACMMM, 2021. doi: 10.1145/3474085.3475418.