All data visualization tools require analysis pipelines, reference datasets, visualization components and cloud infrastructure. Rather than reinvent the wheel, we leverage and contribute to several Open Source projects. We are currently working on a Visualization Platform with support for Single Cell Analysis. As we complete vital pieces of the infrastructure, we will be releasing all source code under permissive MIT licensing.

Single Cell Data Visualizaiton Platform - High Level System Design

Single Cell Sequencing Pipelines

Experimental methods for scRNA-seq are becoming increasingly accessible to many laboratories and computational pipelines for handling raw data files remain limited. To make scRNA analysis more accessible to our community, we are implementing a cloud-based single-cell pipeline. This pipeline runs analysis such as dimension reduction, clustering, differential expression, trajectory analysis, and cell type calling from several of the most trusted analytical toolkits, including Monocle, Seurat and ScanPy.

Reference Data

What publications are associated with a gene? Is an A>C variant on C6 known to cause any disorders? What neighbors and pathways are associated with a given gene? Answers to these questions require reference datasets. To date, we have curated and programmatically exposed ~2TB of reference data from over 25 public datamarts. With each project, we ingest new data, and the richness of this resource grows.

Web Framework

We strive to make all academic research data more discoverable and useful to amateurs and professionals alike. Our web publication framework allows researchers to create rich interactive websites with ease. Whether you want to publish a data atlas, promote a recent publication or to share knowledge with collaborators — Simply upload your data and customize your site using a simple text editor. Let us worry about the code and cloud.

Cloud Compute

As the costs of sequencing decrease, the resolution of data increases and the methods used to interrogate become more powerful we are rapidly approaching a time when biostatisticians will no longer be able to work on their desktops. We are developing serverless batch processing frameworks as well as distributed in-memory ad-hoc analysis capabilities to address these needs.