Artificial intelligence for prediction of treatment outcomes in breast cancer: Systematic review of design, reporting standards, and bias


      • Pitfalls in applying population-based data to individual patients are well-known. (83/85)
      • AI-based algorithms may improve personalized treatment approaches in breast cancer. (85/85)
      • However, methodological limitations may limit clinical impact. (64/85)
      • We highlight reporting gaps, limited external validation, poor code/data sharing. (83/85)
      • We provide solutions to ensure a robust evidence base in this emerging field. (79/85)



      Artificial intelligence (AI) has the potential to personalize treatment strategies for patients with cancer. However, current methodological weaknesses could limit clinical impact. We identified common limitations and suggested potential solutions to facilitate translation of AI to breast cancer management.


      A systematic review was conducted in MEDLINE, Embase, SCOPUS, Google Scholar and PubMed Central in July 2021. Studies investigating the performance of AI to predict outcomes among patients undergoing treatment for breast cancer were included. Algorithm design and adherence to reporting standards were assessed following the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement. Risk of bias was assessed by using the Prediction model Risk Of Bias Assessment Tool (PROBAST), and correspondence with authors to assess data and code availability.


      Our search identified 1,124 studies, of which 64 were included: 58 had a retrospective study design, with 6 studies with a prospective design. Access to datasets and code was severely limited (unavailable in 77% and 88% of studies, respectively). On request, data and code were made available in 28% and 18% of cases, respectively. Ethnicity was often under-reported (not reported in 52 of 64, 81%), as was model calibration (63/64, 99%). The risk of bias was high in 72% (46/64) of the studies, especially because of analysis bias.


      Development of AI algorithms should involve external and prospective validation, with improved code and data availability to enhance reliability and translation of this promising approach.
      Protocol registration number: PROSPERO - CRD42022292495.


      To read this article in full you will need to make a payment
      ESMO Member Login
      Login with your ESMO username and password.
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'

      Purchase one-time access:

      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Bray F.
        • Ferlay J.
        • Soerjomataram I.
        • Siegel R.L.
        • Torre L.A.
        • Jemal A.
        Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries.
        CA Cancer J Clin. 2018; 68: 394-424
        • Gennari A.
        • André F.
        • Barrios C.H.
        • Cortés J.
        • de Azambuja E.
        • DeMichele A.
        • et al.
        ESMO Clinical Practice Guideline for the diagnosis, staging and treatment of patients with metastatic breast cancer.
        Ann Oncol. 2021; 32: 1475-1495
        • Chou W.-Y.
        • Hamel L.M.
        • Thai C.L.
        • Debono D.
        • Chapman R.A.
        • Albrecht T.L.
        • et al.
        Discussing prognosis and treatment goals with patients with advanced cancer: A qualitative analysis of oncologists' language.
        Health Expect. 2017; 20: 1073-1080
        • Sammut S.-J.
        • Crispin-Ortuzar M.
        • Chin S.-F.
        • Provenzano E.
        • Bardwell H.A.
        • Ma W.
        • et al.
        Multi-omic machine learning predictor of breast cancer therapy response.
        Nature. 2022; 601: 623-629
        • Gallifant J.
        • Zhang J.
        • del Pilar Arias Lopez M.
        • Zhu T.
        • Camporota L.
        • Celi L.A.
        • et al.
        Artificial intelligence for mechanical ventilation: systematic review of design, reporting standards, and bias.
        Br J Anaesth. 2022; 128: 343-351
        • Celi L.A.
        • Mark R.G.
        • Stone D.J.
        • Montgomery R.A.
        “Big data” in the intensive care unit. Closing the data loop.
        Am J Respir Crit Care Med. 2013; 187: 1157-1160
        • Futoma J.
        • Simons M.
        • Panch T.
        • Doshi-Velez F.
        • Celi L.A.
        The myth of generalisability in clinical research and machine learning in health care.
        Lancet Digit Health. 2020; 2: e489-e492
        • Vasey B.
        • Clifton D.A.
        • Collins G.S.
        • et al.
        DECIDE-AI: new reporting guidelines to bridge the development-to-implementation gap in clinical artificial intelligence.
        Nat Med. 2021; 27: 186-187
        • Wolff R.F.
        • Moons K.G.M.
        • Riley R.D.
        • Whiting P.F.
        • Westwood M.
        • Collins G.S.
        • et al.
        PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies.
        Ann Intern Med. 2019; 170: 51
        • Collins G.S.
        • Reitsma J.B.
        • Altman D.G.
        • Moons K.G.M.
        Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement.
        BMJ. 2014; 350
      1. Altman D, Antes G, Atkins D, et al. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 2021 2009;6(7):e1000097. doi:10.1371/journal.pmed.1000097.

        • Moons K.G.M.
        • Altman D.G.
        • Reitsma J.B.
        • Ioannidis J.P.A.
        • Macaskill P.
        • Steyerberg E.W.
        • et al.
        Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration.
        Ann Intern Med. 2015; 162
        • Stevens L.M.
        • Mortazavi B.J.
        • Deo R.C.
        • Curtis L.
        • Kao D.P.
        Recommendations for Reporting Machine Learning Analyses in Clinical Research.
        Circ Cardiovasc Qual Outcomes. 2020; 13e006556
        • van de Sande D.
        • van Genderen M.E.
        • Huiskens J.
        • Gommers D.
        • van Bommel J.
        Moving from bytes to bedside: a systematic review on the use of artificial intelligence in the intensive care unit.
        Intensive Care Med. 2021; 47: 750-760
        • Komorowski M.
        Artificial intelligence in intensive care: are we there yet?.
        Intensive Care Med. 2019; 45: 1298-1300
      2. US Food & Drug Administration (FDA). Proposed regulatory framework for modifications to artificial intelligence/machine learning (AI/ML)-based software as a medical device (SaMD). Discussion paper and request for feedback 2019. Available from: [Accessed 10 July 2021].

      3. Nagendran M, Chen Y, Lovejoy CA, et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ. 2020;368:m689. doi:10.1136/bmj.m689.

        • Charpignon M.L.
        • Celi L.A.
        • Samuel M.C.
        Who does the model learn from?.
        Lancet Digit Health. 2021; 3: e275-e276
        • Stupple A.
        • Singerman D.
        • Celi L.A.
        The reproducibility crisis in the age of digital medicine.
        NPJ Digit Med. 2019; 2: 2
        • Swami N.
        • Corti C.
        • Curigliano G.
        • Celi L.A.
        • Dee E.C.
        Exploring biases in predictive modelling across diverse populations.
        The Lancet Healthy Longevity. 2022; 3: e88
        • Soto G.J.
        • Martin G.S.
        • Gong M.N.
        Healthcare disparities in critical illness.
        Crit Care Med. 2013; 41: 2784-2793
        • White D.B.
        • Lo B.
        Mitigating Inequities and Saving Lives with ICU Triage during the COVID-19 Pandemic.
        Am J Respir Crit Care Med. 2021; 203: 287-295
        • Health T.L.D.
        Race representation matters in cancer care.
        Lancet Digit Health. 2021; 3e408
      4. Lundberg S, Su-In L. A unified approach to interpreting model predictions. Advances In Neural Information Processing Systems. 31st Conference on Neural Information Processing Systems (NIPS 2017), 2017:4768-4777, Long Beach, CA, USA.

        • Benjamens S.
        • Dhunnoo P.
        • Meskó B.
        The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database.
        NPJ Digit Med. 2020; 3: 118
        • Wu E.
        • Wu K.
        • Daneshjou R.
        • Ouyang D.
        • Ho D.E.
        • Zou J.
        How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals.
        Nat Med. 2021; 27: 582-584
        • Lehne M.
        • Sass J.
        • Essenwanger A.
        • Schepers J.
        • Thun S.
        Why digital medicine depends on interoperability.
        NPJ Digit Med. 2019; 2: 79
        • Cirillo D.
        • Catuara-Solarz S.
        • Morey C.
        • Guney E.
        • Subirats L.
        • Mellino S.
        • et al.
        Sex and gender differences and biases in artificial intelligence for biomedicine and healthcare.
        NPJ Digit Med. 2020; 3
        • Yoshida E.
        • Fei S.
        • Bavuso K.
        • Lagor C.
        • Maviglia S.
        The Value of Monitoring Clinical Decision Support Interventions.
        Appl Clin Inform. 2018; 9: 163-173
        • Lee C.S.
        • Lee A.Y.
        Clinical applications of continual learning machine learning.
        Lancet Digit Health. 2020; 2: e279-e281
        • Vokinger K.N.
        • Feuerriegel S.
        • Kesselheim A.S.
        Continual learning in medical devices: FDA's action plan and beyond.
        Lancet Digit Health. 2021; 3: e337-e338
      5. OPTIMA - IMI Innovative Medicines Initiative. Available from: [Accessed 10 March 2022].

        • Rieke N.
        • Hancox J.
        • Li W.
        • Milletarì F.
        • Roth H.R.
        • Albarqouni S.
        • et al.
        The future of digital health with federated learning.
        NPJ Digit Med. 2020; 3
        • Warren L.R.
        • Clarke J.
        • Arora S.
        • Darzi A.
        Improving data sharing between acute hospitals in England: an overview of health record system distribution and retrospective observational analysis of inter-hospital transitions of care.
        BMJ Open. 2019; 9e031637
        • Lubin I.M.
        • Aziz N.
        • Babb L.J.
        • Ballinger D.
        • Bisht H.
        • Church D.M.
        • et al.
        Principles and Recommendations for Standardizing the Use of the Next-Generation Sequencing Variant File in Clinical Settings.
        J Mol Diagn. 2017; 19: 417-426
        • Yao K.
        • Singh A.
        • Sridhar K.
        • Blau J.L.
        • Ohgami R.S.
        Artificial Intelligence in Pathology: A Simple and Practical Guide.
        Adv Anat Pathol. 2020; 27: 385-393
        • Richards S.
        • Aziz N.
        • Bale S.
        • Bick D.
        • Das S.
        • Gastier-Foster J.
        • et al.
        Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology.
        Genet Med. 2015; 17: 405-424
      6. Fraikin G. Fabric genomics announces AI-based ACMG Classification solution for genetic testing with hereditary panels. Businesswire; 2019. Available from: [Accessed 10 July 2021].