Yesterday, we discussed a useful stem cell tool called the CIRM iPSC Repository, which will contain over 3000 human induced pluripotent stem cell (iPSC) lines – from patients and healthy individuals – that contain a wealth of information about human diseases. Now that scientists have access to these lines, they need the proper tools to study them. This is where CIRM’s Genomics Initiative comes into play.
Crunching stem cell data
In 2014, CIRM funded the Genomics Initiative, which created the Center of Excellence in Stem Cell Genomics (CESCG). The goal of the CESCG is to develop novel genomics and bioinformatics tools specifically for stem cell research. These technologies aim to advance our fundamental understanding of human development and disease mechanisms, improve current cell and tissue production methods, and accelerate personalized stem cell-based therapies.
The CESCG is a consortium between Stanford University, the Salk Institute and UC Santa Cruz. Together, the groups oversee or support more than 20 different research projects throughout California focused on generating and analyzing sequencing data from stem or progenitor cells. Sequencing technology today is not only used to decode DNA, but also used to study other genomic data like that provides information about how gene activity is regulated.
Many of the projects within the CESCG are using these sequencing techniques to define the basic genetic properties of specific cell types, and will use this information to create better iPSC-based tissue models. For example, scientists can determine what genes are turned on or off in cells by analyzing raw data from RNA sequencing experiments (RNA is like a photocopy of DNA sequences and is the cell’s way of carrying out the instructions contained in the DNA. This technology sequences and identifies all the RNA that is generated in a tissue or cell at a specific moment). Single cell RNA sequencing, made possible by techniques such as Drop-seq mentioned in yesterday’s blog, are now further revealing the diversity of cell types within tissues and creating more exact reference RNA sequences to identify a specific cell type. By comparing RNA sequencing data from single cells of stem cell-based models to previously referenced cell types, researchers can estimate how accurate, or physiologically relevant, those stem cell models are.
Such comparative analyses can only be done using powerful software that can compare millions of sequence data at the same time. Part of a field termed bioinformatics, these activities are a significant portion of the CESCG and several software tools are being created within the Initiative. Josh Stuart, a faculty member at UC Santa Cruz School of Engineering and a primary investigator in the CESCG, explained their team’s vision:
“A major challenge in the field is recognizing cell types or different states of the same cell type from raw data. Another challenge is integrating multiple data sets from different labs and figuring out how to combine measurements from different technologies. At the CESCG, we’re developing bioinformatics models that trace through all this data. Our goal is to create a database of these traces where each dot is a cell and the curves through these dots explain how the cells are related to one another.”
Stuart’s hope is that scientists will input their stem cell data into the CESCG database and receive a scorecard that explains how accurate their cell model is based on a specific genetic profile. The scorecard will help will not only provide details on the identity of their cells, but will also show how they relate to other cell types found in their database.
The Brain of Cells
A good example of how this database will work is a project called the Brain of Cells (BOC). It’s a collection of single cell RNA sequencing data from thousands of fetal-derived brain cells provided by multiple labs. The idea is that researchers will input RNA sequencing data from the stem cell-derived brain cells they make in their labs and the BOC will give them back a scorecard that describes what types of cells they are and their developmental state by comparing them to the referenced brain cells.
One of the labs that is actively involved in this project and is providing the bulk of the BOC datasets is Arnold Kriegstein’s lab at UC San Francisco. Aparna Bhaduri, a postdoctoral fellow in the Kriegstein lab working on the BOC project, outlined the goal of the BOC and how it will benefit researchers:
“The goal of the Brain of Cells project is to find ways to leverage existing datasets to better understand the cells in the developing human brain. This tool will allow researchers to compare cell-based models (such as stem cell-derived 3D organoids) to the actual developing brain, and will create a query-able resource for researchers in the stem cell community.”
Pablo Cordero, a former postdoc in Josh Stuart’s lab who designed a bioinformatics tool used in BOC called SCIMITAR, explained how the BOC project is a useful exercise in combining single cell data from different external researchers into one map that can predict cell type or cell fate.
“There is no ‘industry standard’ at the moment,” said Cordero. “We have to find various ways to perform these analyses. Approximating the entire human cell lineage is the holy grail of regenerative medicine since in theory, we would have maps of gene circuits that guide cell fate decisions.”
Once the reference data from BOC is ready, the group will use a bioinformatics program called Sample Psychic to create the scorecards for outside researchers. Clay Fischer, project manager of the CESCG at UC Santa Cruz, described how Sample Psychic works:
“Sample Psychic can look at how often genes are being turned off and on in cells. It uses this information to produce a scorecard, which shows how closely the data from your cells maps up to the curated cell types and can be used to infer the probability of the cell type.”
The BOC group believes that the analyses and data produced in this effort will be of great value to the research community and scientists interested in studying developmental neuroscience or neurodegeneration.
The Brain of Cells project is still in its early stages, but soon scientists will be able to use this nifty tool to help them build better and more accurate models of human brain development and brain-related diseases.
CESCG is also pursuing stem cell data driven projects focused on developing similar databases and scorecards for heart cells and pancreatic cells. These genomics and bioinformatics tools are pushing the envelope to a day when scientists can connect the dots between how different cell states and cell fates are determined by computational analysis and leverage this information to generate better iPSC-based systems for disease modeling in the lab or therapeutics in the clinic.