PAIRWISE NONLINEAR DEPENDENCE ANALYSIS OF GENOMIC DATA

Ann Appl Stat. 2023 Dec;17(4):2924-2943. doi: 10.1214/23-aoas1745. Epub 2023 Oct 30.

Abstract

In The Cancer Genome Atlas (TCGA) data set, there are many interesting nonlinear dependencies between pairs of genes that reveal important relationships and subtypes of cancer. Such genomic data analysis requires a rapid, powerful and interpretable detection process, especially in a high-dimensional environment. We study the nonlinear patterns among the expression of pairs of genes from TCGA using a powerful tool called Binary Expansion Testing. We find many nonlinear patterns, some of which are driven by known cancer subtypes, some of which are novel.

Keywords: Binary Expansion; Genomic data; Nonlinear dependence; Nonparametric dependence testing.