Integrated Cluster Analysis with R in Drug Discovery Experiments using Multi-Source Data
useR!2017: Integrated Cluster Analysis with R in Dr...
- Institute for Biostatistics and Statistical Bioinformatics, Hasselt University, Belgium
- Independent consultant
Discovering the exact activities of a compound is of primary interest in drug development. A single drug can interact with multiple targets and unintended drug-target interactions could lead to severe side effects. Therefore, it is valuable in the early phases of drug discovery to not only demonstrate the desired on-target efficacy of compounds but also to outline its unwanted off-target effects. Further, the earlier unwanted behaviour is documented, the better. Otherwise, the drug could fail in a later stage which means that the invested time, effort and money are lost.
In the early stages of drug development, different types of information on the compounds are collected: the chemical structures of the molecules (fingerprints), the predicted targets (target predictions), on various bioassays, the toxicity and more. An analysis of each data source could reveal interesting yet disjoint information. It only provides a limited point of view and does not give information on how everything is interconnected in the global picture (Shi, De Moor, and Moreau 2009). Therefore, a simultaneous analysis of multiple data sources can provide a more complete insight on the compounds' activity.
An analysis based on multiple data sources is relatively new and growing area in drug discovery and drug development. Multi-source clustering procedures provide us with the opportunity to relate several data sources to each other to gain a better understanding of the mechanism of action of compounds. The use of multiple data sources was investigated in the QSTAR (quantitative structure transcriptional activity relationship) consortium (Ravindranath et al. 2015). The goal was to find associations between chemical, bioassay and transcriptomic data in the analysis of a set of compounds under development.
In the current study, we extend the clustering method presented in(Perualila-Tan et al. 2016) and review the performance of several clustering methods on a real drug discovery project in R. We illustrate how the new clustering approaches provide a valuable insight for the integration of chemical, bioassay and transcriptomic data in the analysis of a specific set of compounds. The proposed methods are implemented and publicly available in the R package IntClust which is a wrapper package for a multitude of ensemble clustering methods.
References Perualila-Tan, N., Z. Shkedy, W. Talloen, H. W. H. Goehlmann, QSTAR Consortium, M. Van Moerbeke, and A. Kasim. 2016. "Weighted-Similarity Based Clustering of Chemical Structure and Bioactivity Data in Early Drug Discobased." Journal of Bioinformatics and Computational Biology.
Ravindranath, A. C., N. Perualila-Tan, A. Kasim, G. Drakakis, S. Liggi, S. C. Brewerton, D. Mason, et al. 2015. "Connecting Gene Expression Data from Connectivity Map and in Silico Target Predictions for Small Molecule Mechanism-of-Action Analysis." Mol. BioSyst. 11 (1). The Royal Society of Chemistry: 86–96. doi:10.1039/C4MB00328D.
Shi, Y., B. De Moor, and Y. Moreau. 2009. "Clustering by Heterogeneous Data Fusion: Framework and Applications." NIPS Workshop.