
We present RainDrop, a fast tool for the computation of gene-cell count matrices from dscRNA-seq data produced by 10x Genomics Chromium v2 protocols. Since the number of studied cells continues to increase, corresponding runtime and memory requirements will become even more severe, especially for ultra-large datasets such as the recently published 1.3 million-cell dataset. However, for larger datasets Alevin still needs a significant amount of time. They showed that this approach produces similar results while only requiring a fraction of the time compared to Cell Ranger. recently used a different approach in their tool Alevin by relying on quasi-mappings using a suffix array based structure. This pipeline is based on alignments using STAR which are expensive to calculate.


For example, calculating a gene-cell expression matrix (also called feature-barcode matrix) with the popular Cell Ranger pipeline can take several hours even for a medium-sized input read dataset.
#10x chromium droplet based software#
However, existing software tools often suffer from scalability issues due to the massive amounts of reads. Thus, efficient processing of the generated data is critical. Since the number of studied cells continues to increase this can be an enormously time-consuming task. For each cell this matrix shows the estimated count of genes within that cell based on the number of reads mapping to corresponding transcript sequences. Thus, the creation of gene-cell-count matrices from dscRNA-seq data is of high technical importance. the identification of genetic differences between cancerous and non-cancerous cells or finding connections between different cell types. The gained knowledge can be key to many biological research areas e.g. Information about gene expression in terms of cDNA counts within certain cells is a crucial processing step for further analysis such as clustering or imputation. ĭroplet-based single-cell RNA-seq (dscRNA-seq) protocols have gained increasing attention due to their ability to profile the transcriptome of thousands of cells in a single assay. RainDrop is a software tool for highly efficient processing of large-scale droplet-based single-cell RNA-seq datasets on standard workstations written in C++. It significantly outperforms the established Cell Ranger pipeline and the recently introduced Alevin tool in terms of runtime by a maximal (average) speedup of 30.4 (22.6) and 3.5 (2.4), respectively, while keeping high agreements of the generated results. RainDrop can process single-cell transcriptomic datasets consisting of 784 million reads sequenced from around 8.000 cells in less than 40 minutes on a standard workstation. We address this need by presenting RainDrop for fast gene-cell count matrix computation from single-cell RNA-seq data produced by 10x Genomics Chromium technology. This establishes the need for software tools for efficient processing of the produced large-scale datasets. With the advancement of droplet-based protocols the number of studied cells continues to increase rapidly. Obtaining data from single-cell transcriptomic sequencing allows for the investigation of cell-specific gene expression patterns, which could not be addressed a few years ago.
