多模态深度学习及其在眼科人工智能的应用展望

李锡荣

doi:10.12290/xhyxzz.2021-0500

多模态深度学习及其在眼科人工智能的应用展望

doi: 10.12290/xhyxzz.2021-0500

李锡荣^,

基金项目:

北京市自然科学基金面上项目 4202033

北京市自然科学基金-海淀原始创新联合基金 19L2062

北京市科委医药协同创新专项课题 Z191100007719002

详细信息

通讯作者:
李锡荣电话：010-82504345，E-mail: xirong@ruc.edu.cn

中图分类号: R77; TP18
计量
- 文章访问数: 606
- HTML全文浏览量: 624
- PDF下载量: 0
- 被引次数: 0
出版历程
- 收稿日期: 2021-06-28
- 录用日期: 2021-07-29
- 网络出版日期: 2021-11-26
- 发布日期: 2021-08-19
- 刊出日期: 2021-09-30

Multi-modal Deep Learning and Its Applications in Ophthalmic Artificial Intelligence

LI Xirong^,

Funds:

Beijing Natural Science Foundation 4202033

Beijing Natural Science Foundation Haidian Original InnovationJoint Fund 19L2062

the Pharmaceutical Collaborative Innovation Research Project of Beijing Science and Technology Commission Z191100007719002

More Information

Corresponding author: LI Xirong Tel: 86-10-82504345, E-mail: xirong@ruc.edu.cn

摘要

摘要: 深度学习的强学习能力和高易用性使其成为当前主流机器学习算法和医学人工智能的核心技术。鉴于医学影像在健康筛查、疾病诊断、精准治疗、预后评估等诸多任务中的关键作用，用于医学影像结构分析与语义理解的深度学习正成为重要的交叉学科研究方向。在临床场景中，医生为了实现更精准的诊断，往往需要同时参考不同类型、不同模态的影像样本进行综合分析和判断。本文介绍面向此类场景的多模态深度学习的基本概念和工作原理，结合具体案例分析多模态深度学习在眼科领域的研究进展、应用情况及技术挑战，并对该技术的应用前景作出展望。
- 多模态深度学习 /
- 眼科 /
- 人工智能 /
- 辅助诊断
Abstract: Deep learning, for its powerful learning capability and high usability, has been a prevalent algorithm of machine learning and a core technique for artificial intelligence(AI) in medicine and healthcare. Due to the importance of medical imaging in many tasks such as health screening, disease diagnosis, precise treatment, and prognosis prediction, deep learning of structural analysis and semantic understanding for medical images is becoming an important interdisciplinary research direction. In clinical scenarios, in order to achieve a more accurate diagnosis, doctors need to simultaneously refer to multiple modalities of medical imaging for a comprehensive analysis and judgment. This article introduced the basic concepts and working principles of multimodal deep learning in such scenarios, reviewed recent research progress on applying multi-modal deep learning in both generic medical fields and ophthalmology, and discussed technical challenges and also envision potential applications of multi-modal deep learning in AI-assisted ophthalmology.
- multi-modal deep learning /
- ophthalmology /
- artificial intelligence /
- assisted diagnosis

HTML全文

图 1 不同类型眼科影像示例

A.眼底彩照; B.荧光素眼底血管造影; C.超广角眼底图像; D.光学相干断层成像; E.裂隙灯照片(斜照法)

下载: 全尺寸图片幻灯片

图 2 多模态深度学习的3种范式(虚线方框)

A.数据层融合；B.特征层融合；C.任务层融合

下载: 全尺寸图片幻灯片

表 1 单模态深度学习在眼科领域的应用举例

年份(年)	研究者	任务	单模态输入
2016	Gulshan等^[7]	DR转诊/非转诊分类	单张眼底彩照
2017	Burlina等^[8]	AMD分级	单张眼底彩照
2018	Kermany等^[9]	多病种识别	OCT图像序列
2018	Wei等^[10]	激光斑检测	单张眼底彩照
2019	Lai等^[11]	左右眼识别	单张眼底彩照
2019	Xu等^[12]	核性白内障分级	单张裂隙灯照片
2019	Yang等^[13]	视盘-黄斑联合定位	单张超广角眼底图像
2020	Wu等^[14]	异常检测	单张OCT B-scan图像
2020	Ding等^[15]	视盘/视杯分割	单张眼底彩照
2020	Ding等^[16]	RNFLD检测	单张眼底彩照
2020	Wei等^[17]	眼底病灶分割, DR分级	单张眼底彩照
2020	Li等^[18]	ROP检测	多张眼底彩照
2021	Li等^[19]	多病种识别	单张眼底彩照
2021	Zhang等^[20]	多病种识别	单张超广角眼底图像
DR：糖尿病视网膜病变；AMD：年龄相关性黄斑变性；OCT：光学相干断层成像；RNFLD：视神经纤维层缺损；ROP：早产儿视网膜病变

下载: 导出CSV

表 2 多模态深度学习在医学领域的应用举例

年份(年)	研究者	任务	多模态输入	融合层级	融合策略
2020	Wang等^[28]	乳腺癌分类	普通超声, 彩色多普勒超声, 剪切波弹性成像, 应变弹性成像	特征层	特征拼接
2020	Zhou等^[29]	脑肿瘤患者总生存期预测	4种模态(T1、T1ce、T2、FLAIR)的MR影像	特征层	特征拼接
2020	Chen等^[26]	癌症诊断与预后预测	组织病理学图像, 基因组特征	特征层	张量融合
2020	Jiang等^[30]	胰腺分割	静脉期CT, 动脉期CT	特征层	多层次选择性特征融合
2020	Peng等^[31]	癌细胞远端转移预测	PET, CT	特征层	网络结构搜索

下载: 导出CSV

表 3 多模态深度学习在眼科领域的应用举例

年份(年)	研究者	任务	多模态输入	融合层级	融合策略
2019	Wang等^[32]	AMD分类	眼底彩照，OCT图像	特征层	特征拼接
2020	Xu等^[33]	AMD/PCV分类	眼底彩照，OCT图像	特征层	特征拼接
2020	Li等^[24]	特定眼底疾病识别	眼底彩照, 算法合成FFA	数据层	样本混合
2021	Yang等^[27]	多种眼底疾病识别	眼底彩照, OCT图像序列	任务层	平均得分
AMD、OCT：同表 1；PCV：息肉状脉络膜血管病变；FFA：荧光素眼底血管造影

下载: 导出CSV

参考文献(35)

[1]	Etzioni O, Decario N. AI can help scientists find a COVID-19 vaccine[EB/OL ]. [2021-06-16]. https://www.wired.com/story/opinion-ai-can-help-find-scientists-find-a-covid-19-vaccine.
[2]	Laguarta J, Hueto F, Subirana B. COVID-19 artificial intelligence diagnosis using only cough recordings[J]. IEEE Open J Eng Med Biol, 2020, 1: 275-281. doi: 10.1109/OJEMB.2020.3026928
[3]	Zeeberg A. D.I.Y. Artificial intelligence comes to a Japanese family farm[EB/OL ]. [2021-06-16]. https://www.newyorker.com/tech/annals-of-technology/diy-artificial-intelligence-comes-to-a-japanese-family-farm.
[4]	Bengio Y. Learning deep architectures for AI[G]. Foundations and Trends^® in Machine Learning, 2009, 2: 1-127.
[5]	Schmidhuber J. Deep learning in neural networks: An overview[J]. Neural Netw, 2015, 61: 85-117. http://www.onacademic.com/detail/journal_1000036789998910_729a.html
[6]	Zheng A, Casari A. Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists[M]. New York: O'Reilly Media Inc., 2018.
[7]	Gulshan V, Peng L, Coram M, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs[J]. JAMA, 2016, 316: 2402-2410. doi: 10.1001/jama.2016.17216
[8]	Burlina PM, Joshi N, Pekala M, et al. Automated grading of age-related macular degeneration from color fundus images using deep convolutional neural networks[J]. JAMA Ophthalmol, 2017, 135: 1170-1176. doi: 10.1001/jamaophthalmol.2017.3782
[9]	Kermany DS, Goldbaum M, Cai W, et al. Identifying medical diagnoses and treatable diseases by image-based deep learning[J]. Cell, 2018, 172: 1122-1131. e9. doi: 10.1016/j.cell.2018.02.010
[10]	Wei Q, Li X, Wang H, et al. Laser scar detection in fundus images using convolutional neural network[C]. ACCV, 2018: 191-206.
[11]	Lai X, Li X, Qian R, et al. Four models for automatic recognition of left and right eye in fundus images[C]. MMM, 2019: 507-517.
[12]	Xu C, Zhu X, He W, et al. Fully deep learning for slit-lamp photo based nuclear cataract grading[C]. MICCAI, 2019: 513-521.
[13]	Yang Z, Li X, He X, et al. Joint localization of optic disc and fovea in ultra-widefield fundus images[C]. MLMI, 2019: 453-460.
[14]	Wu J, Zhang Y, Wang J, et al. AttenNet: Deep attention based retinal disease classification in OCT images[C]. MMM, 2020: 565-576.
[15]	Ding F, Yang G, Wu J, et al. High-order attention networks for medical image segmentation[C]. MICCAI, 2020: 253-262.
[16]	Ding F, Yang G, Ding D, et al. Retinal nerve fiber layer defect detection with position guidance[C]. MICCAI, 2020: 745-754.
[17]	Wei Q, Li X, Yu W, et al. Learn to segment retinal lesions and beyond[C]. ICPR, 2020: 7403-7410.
[18]	Li X, Wan W, Y. Zhou, et al. Deep multiple instance learning with spatial attention for ROP case classification, instance selection and abnormality localization[C]. ICPR, 2020: 7293-7298.
[19]	Li B, Chen H, Zhang B, et al. Development and evaluation of a deep learning model for the detection of multiple fundus diseases based on colour fundus photography[J]. Br J Ophthalmol, 2021. doi: 10.1136/bjophthalmol-2020-316290.
[20]	Zhang C, He F, Li B, et al. Development of a deep-learning system for detection of lattice degeneration, retinal breaks, and retinal detachment in tessellated eyes using ultra-wide-field fundus images: A pilot study[J]. Graefes Arch Clin Exp Ophthalmol, 2021, 259: 2225-2234. doi: 10.1007/s00417-021-05105-3
[21]	Zhang C, Yang Z, He X, et al. Multimodal intelligence: Representation learning, information fusion, and applications[J]. IEEE J Sel Top Signal Process, 2020, 14: 478-493. doi: 10.1109/JSTSP.2020.2987728
[22]	Baltrušaitis T, Ahuja C, Morency LP. Multimodal machine learning: A survey and taxonomy[J]. IEEE Trans Pattern Anal Mach Intell, 2018, 41: 423-443. http://arxiv.org/pdf/1705.09406
[23]	Wang J, Tian K, Ding D, et al. Unsupervised domain expansion for visual categorization[J]. ACM Trans Multimedia Comput Commun Appl, 2021. https://arxiv.org/abs/2104.00233. https://arxiv.org/abs/2104.00233
[24]	Li X, Jia M, Islam M T, et al. Self-supervised feature learning via exploiting multi-modal data for retinal disease diag-nosis[J]. IEEE Trans Med Imaging, 2020, 39: 4023-4033. doi: 10.1109/TMI.2020.3008871
[25]	Wang W, Xu Z, Yu W, et al. Two-stream CNN with loose pair training for multi-modal AMD categorization[C]. MICCAI, 2019: 156-164.
[26]	Chen RJ, Lu MY, Wang J, et al. Pathomic fusion: an integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis[J]. IEEE Trans Med Imaging, 2020. doi: 10.1109/TMI.2020.3021387.
[27]	Yang J, Yang Z, Mao Z, et al. Bi-modal deep learning for recognizing multiple retinal diseases based on color fundus photos and OCT images[C]. ARVO Annual Meeting, 2021.
[28]	Wang J, Miao J, Yang X, et al. Auto-weighting for breast cancer classification in multi- modal ultrasound[C]. MICCAI, 2020: 190-199.
[29]	Zhou T, Fu H, Zhang Y, et al. M2Net: Multi-modal multi-channel network for overall survial time prediction of brain tumor patients[C]. MICCAI, 2020: 221-231.
[30]	Jiang X, Luo Q, Wang Z, et al. Multiphase and multi-level selective feature fusion for automated pancreas segment from CT images[C]. MICCAI, 2020: 460-469.
[31]	Peng Y, Bi L, Fulham M, et al. Multi-modality information fusion for radiomics-based neural architecture search[C]. MICCAI, 2020: 763-771.
[32]	Wang W, Xu Z, Yu W, et al. Two-stream CNN with loose pair training for multi-modal AMD categorization[C]. MICCAI, 2019: 156-164.
[33]	Xu Z, Wang W, Yang J, et al. Automated diagnoses of age-related macular degeneration and polypoidal choroidal vasculopathy using bi-modal deep convolutional neural networks[J]. Br J Ophthalmol, 2021, 105: 561-566. doi: 10.1136/bjophthalmol-2020-315817
[34]	Zhu J, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]. ICCV, 2017: 2223-2232.
[35]	Li X, Zhou Y, Wang J, et al. Multi-modal multi-instance learning for retinal disease recognition[C]. ACMMM, 2021. doi: 10.1145/3474085.3475418.