The RAS Initiative Bioinformatics Research Team provides data processing and analytical support through three broad approaches:
- Processing and analysis of data from RAS Initiative colleagues or our collaborators
- Exploratory or hypothesis driven data mining for RAS Initiative colleagues
- Data mining of public data (e.g., TCGA, DepMap) for novel insights of RAS biology
We also created software tools to organize and interpret data produced by the RAS Initiative. We imported “big data” from public data repositories and analyzed them with a focus on the genes in the RAS pathway. By integrating internal and external data, we help improve understanding of RAS-driven cancers.
Data mining, analysis for drug discovery
The RAS Bioinformatics Research Team is integrally involved in drug discovery projects targeting KRAS and RAC1 and is responsible for processing raw data and analysis, data visualization and side-by-side comparison of selected mutant targets, and curve fitting for dosage data of selected candidate lead compounds.
The RAS Initiative, together with groups from the University of California, San Francisco, and the Broad Institute in Boston, have probed dozens of cancer cell lines for the effect of knocking down entire signaling nodes (such as all three RAS genes, all three RAF genes, etc.).
By data mining of public expression and survival outcome data from CCLE and TCGA as well as internal siREN data, we derived computational RAS-dependency indexes (RDIs) as putative measures of RAS dependency. Our RDIs correlate well with experimental data. The computational RDIs were expanded to cancer patients and have potential clinical applications.
Our team has analyzed the “big data” produced by The Cancer Genome Atlas in the context of mutant or wild-type RAS genes and found intriguing biological connections between mutation status and their expression levels.
Besides data mining and informatics support, we developed in-house tools, such as a survival analysis method that can help avoid a common pitfall of survival analysis. We have also built an architecture that stores, organizes, searches, and retrieves the data produced within the RAS Initiative.
Our capabilities and specializations
Our data mining efforts yield novel hypotheses in RAS biology that benefit the ongoing internal projects as well as the RAS research community.
- Analysis of siREN data (using pools of siRNAs to knock down signaling nodes in cancer cell lines) and integrating it with drug sensitivity data from the Genomics of Drug Sensitivity in Cancer
- Analysis of RAS and RAS pathway gene expression in tumor and normal samples in The Cancer Genome Atlas
- Data mining of public expression and survival outcome data from CCLE and TCGA as well as internal siREN data
- Data processing and analysis for the RAS Initiative and collaborators in probing dozens of cancer cell lines for the effect of knocking down entire signaling nodes
Informatics support
By providing sophisticated bioinformatic tools, we free the biologists from tedious data processing and complicated data analysis so they can focus on the wet-lab experiments. Our informatics support guides the interpretation of data and often provides direct insights into the underlying biology.
- Processing and analysis of the ongoing internal drug screening data
- Support and implement data mining projects with our colleagues from RAS program and NCI
- Ongoing data mining of expression, mutation, and CRISPR screening data from DepMap
- Development of tools to store, track, and retrieve data from RAS Initiative projects
Developing, maintaining data tools
The tools that help us fulfill our mission are mainly derived from public assets. These public or access-controlled cancer related databases and portals provide us the data resources for our data mining projects.
The servers and databases maintained by our organization provided us with the computational and analytical power for data processing and data analysis.
- Developed pathway analysis method, PPEP
- Developed survival analysis method, GradientScanSurv
- Programming, web, and scripting languages
- Public genomics and cancer data sets such as IGCG, TCGA, GENIE, and PCAWG
- Databases including Oracle, mysql, and Filemaker
- Cell line datasets such as nci60, CCLE, DepMap, COSMIC, and GDSC
- Data portals such as Biomart, cBIOportal, and caHUB
Collaborators
- University of California, San Francisco
- Department of Statistics, University of Illinois Urbana-Champaign
- Broad Institute in Boston