
By Dr. Stephen Lin
Single-cell. It is the new buzzword in biology. Single-cell biology refers to the in-depth characterization of individual cells in an organ or similar microenvironment. Every organ, like the brain or heart, is composed of thousands to millions of cells. Single-cell biology breaks those organs down into their individual cell components to study the diversity within those cells. For example, the heart is composed of cardiomyocytes, but within that bulk population of cardiomyocytes there are specialized cardiomyocytes for the different chambers of the heart and others that control beating, plus others not even known yet. Single-cell studies characterize cell-to-cell variability in the body down to this level of detail to gain knowledge of tissues in a way that was not possible before.
The majority of single-cell studies are based on next generation sequencing technologies of genetic material such as DNA or RNA. The cost of sequencing each base of DNA or RNA has dropped precipitously since the first human genome was published in 2000, often compared to the trend seen with Moore’s Law in computing. As a result it is now possible to sequence every gene that is expressed in an individual cell, called the transcriptome, for thousands and thousands of cells.
The explosion of data coming from these technologies requires new approaches to study and analyze the information. The scale of the genetic sequences that can be generated is so big that it is often not possible anymore for scientists to interpret the data manually as had been traditionally done. To apply this exciting field to stem cell research and therapies, CIRM funded the Genomics Initiative which created the Centers of Excellence in Stem Cell Genomics (CESCG). The goal of the CESCG is to create novel genomic information and create new bioinformatics tools (i.e. computer software) specifically for stem cell research, some of which was highlighted in past blogs. Some of the earliest single-cell gene expression atlases of the human body were created under the CESCG.
The latest study from CESCG investigators creates both new information and new tools for single-cell genomics. In work funded by the Genomics Initiative, Stephen Quake and colleagues at Stanford University and the Chan-Zuckerberg Biohub studied tumor formation using single-cell approaches. Drawing from one of the earliest published single-cell studies, the team had surveyed human brain transcriptome diversity that included samples from the brain cancer, glioblastoma.
Recognizing that the data coming from these studies would eventually become too large and numerous to classify all of the cell types by hand, they created a new bioinformatics tool called Northstar to apply artificial intelligence to automatically classify cell types generated by single-cell studies. The cell classifications generated by Northstar were similar to the original classifications created manually several years ago including the identification of specific cancerous cells.
Some of the features that make Northstar a powerful bioinformatics tool for these studies are that the software is scalable for large numbers of cells, it performs the computations to classify cells very fast, and it requires relatively low computer processing power to go through literally millions of data points.
The scalability of the tool was demonstrated on the Tabula Muris data collection, a single-cell compendium of 20 mouse organs with over 200,000 cells of data. Finally, Northstar was used to classify the tumors from new single-cell data generated by the CESCG via samples of 11 patient pancreatic cancer patients obtained from Stanford Hospital. Northstar correctly found the origins of cancerous cells from the specific diagnoses of pancreatic cancer that the patients had, for example cancerous cells in the endocrine cell lineage from a patient diagnosed with neuroendocrine pancreas cancer. Furthermore, Northstar identified previously unknown origins of cancerous cell clusters from other patients with pancreatic cancer. These new computational tools demonstrate how big data from genomic studies can become important contributors to personalized medicine.
The full study was published in Nature.