Linking gene expression to clinical outcomes in pediatric Crohn's disease using machine learning

Sci Rep. 2024 Feb 1;14(1):2667. doi: 10.1038/s41598-024-52678-0.

Abstract

Pediatric Crohn's disease (CD) is characterized by a severe disease course with frequent complications. We sought to apply machine learning-based models to predict risk of developing future complications in pediatric CD using ileal and colonic gene expression. Gene expression data was generated from 101 formalin-fixed, paraffin-embedded (FFPE) ileal and colonic biopsies obtained from treatment-naïve CD patients and controls. Clinical outcomes including development of strictures or fistulas and progression to surgery were analyzed using differential expression and modeled using machine learning. Differential expression analysis revealed downregulation of pathways related to inflammation and extra-cellular matrix production in patients with strictures. Machine learning-based models were able to incorporate colonic gene expression and clinical characteristics to predict outcomes with high accuracy. Models showed an area under the receiver operating characteristic curve (AUROC) of 0.84 for strictures, 0.83 for remission, and 0.75 for surgery. Genes with potential prognostic importance for strictures (REG1A, MMP3, and DUOX2) were not identified in single gene differential analysis but were found to have strong contributions to predictive models. Our findings in FFPE tissue support the importance of colonic gene expression and the potential for machine learning-based models in predicting outcomes for pediatric CD.

MeSH terms

  • Child
  • Constriction, Pathologic
  • Crohn Disease* / pathology
  • Gene Expression
  • Humans
  • Lithostathine / genetics
  • Machine Learning

Substances

  • REG1A protein, human
  • Lithostathine

Supplementary concepts

  • Pediatric Crohn's disease