Biostatistics & Bioinformatics

A first effort has been made by comparing the performances of linear regression based statistical methods in assessing exposome-phenome associations. In an extensive simulation study using realistic empirical exposome data we have shown that correlation between exposures is indeed a major challenge for exposome research, and that current statistical methods are limited in their ability to efficiently differentiate true predictors from correlated covariates. However, some currently available methods like GUESS and DSA do provide a marginally better balance between sensitivity and FDP, although they did not outperform the other multivariate methods across all scenarios and properties examined. Future work in Exposomics is focused on extending these methods to be able to adopt the aforementioned computational complexity and to be able to incorporate external information from previous research, network ontologies, and biological pathways

Exposure calibration

The multidisciplinary expertise of the EXPOsOMICS group enables the collection, in several populations, of modeled exposures based on refined models and cutting edge technologies, as well as PEM. Based on samples in which both types of data are available, we define calibration coefficients to optimize the prediction of true exposure (from PEM studies) from combinations of variables in the modeled exposure matrices. Statistical methods include classical measurement error models and their Bayesian alternatives providing calibrated exposure estimates for air and water pollutants. The resulting estimates are used within EXPOsOMICS for:

  • the characterization of the internal response to the external component of the exposome;
  • to study the association between the calibrated exposures and disease endpoints.

Development, validation and investigation of internal markers of external exposures

EXPOsOMICS has provided a large range of omic profiles in populations where PEM and calibrated exposures are also available. Such analyses rely on the adaptation, in the exposome concept, of methods employed in GWAS. Mainly we use as a benchmark univariate approaches (linear or generalized linear models) coupled to ad-hoc multiple testing correction and FDR control techniques. Bayesian variable selection methods, typically seeking the best combination of markers (omic signals) to predict the outcome (exposure levels) are used here for the first time. In addition, these analyses provide an estimate of the variance in individual responses at similar exposure levels. Finally, in order to identify potential common patterns in the internal signature of exposures across platforms, the correlations between profiles from each platform are analyzed with respect to the external exposures. These `cross-omic’ analyses potentially identify features of the biological pathways linking exposure to disease risk, and rely on well-established methods such as network and clustering methods.