Software

The software that we have developed falls in one of the following categories.

Please see the group’s GitHub page for more details.

RNA design

DERNA DERNA designs an RNA sequence $\mathbf{v}$ that encodes a target protein $\mathbf{w}$, balancing minimum free energy (MFE) and codon adaptation index (CAI). This plays an important role in mRNA vaccine design.
MASS MASS summarizes an ensemble of RNA secondary structures by computing a Maximum Agreement Secondary Structure, a compact representation of the structural features shared across the ensemble.

Cancer genomics

Sapling Sapling infers a small set of backbone trees on a subset of mutations that collectively summarize the entire set of possible phylogenies. Sapling can also grow backbone trees into full phylogenies, and can directly output full phylogenies up to a fraction $1-\rho$ away from optimality.
Dolphyin Dolphyin takes a binary matrix of single-cell SNV sequencing data and outputs a $1$-Dollo phylogeny — a rooted tree on which each character/SNV is gained and lost at most once.
fastppm fastppm (fast perfect phylogeny mixtures) is a library for fast estimation of the frequency matrix $F \in [0,1]^{m \times n}$ given variant and total read count matrices $V, D \in \mathbb{N}^{m \times n}$ over a fixed $n$-clonal tree $T$. It supports $\ell_1$, $\ell_2$, binomial, and beta-binomial losses.
MACH2 MACH2 infers optimal migration histories of metastatic cancer from clonal trees and observed clone locations. It supports multiple parsimony criteria, including the unobserved clone criterion, and guarantees optimality and completeness by systematically enumerating solutions.
Pharming Pharming is a heuristic algorithm that jointly infers clonal trees with SNV and CNA genotypes from low-pass single-cell DNA sequencing.
CNRein CNRein is a deep-reinforcement-learning, evolution-aware algorithm for haplotype-specific copy-number calling on single-cell DNA sequencing data.
Phertilizer Phertilizer infers a clonal tree with SNV genotypes and a cell clustering given ultra-low-coverage single-cell DNA sequencing data.
PCCH PCCH (Parsimonious Consistent Comigration History) infers a migration history from a given phylogeny whose leaves are labeled by locations.
CloMu CloMu is a neural-network-based algorithm for modeling and predicting cancer clonal evolution.
CNAViz CNAViz is an interactive webtool for user-guided segmentation of tumor DNA sequencing data.
PACTION PACTION solves the Parsimonious Clone Tree Integration problem, inferring an integrated clone tree $T$ and proportions $U$ given input clone trees $T_1, T_2$ with clones $\Pi_1, \Pi_2$ and proportions $U_1, U_2$.
doubletD doubletD detects doublets in medium- to high-coverage single-cell DNA sequencing data.
RECAP RECAP is an iterative, heuristic algorithm for the Multiple Choice Consensus Tree (MCCT) problem. Given a family of patient-specific tumor phylogenies and an integer $k > 0$, RECAP returns an expanded tumor phylogeny per patient, an assignment of patients into $k$ clusters, and a consensus tree per cluster summarizing repeated evolutionary patterns.
Phyolin Phyolin tests for linear evolution given single-cell DNA sequencing data of a tumor. Under the assumption that the phylogeny is linear, Phyolin outputs an estimated false-negative rate, an inferred linear perfect phylogeny, and a classification of the tree topology.
PhyDOSE Given candidate tumor phylogenies, a frequency matrix from bulk data, and a confidence level, PhyDOSE computes the minimum number of single cells needed in a follow-up single-cell sequencing experiment to identify the true phylogeny with the desired confidence.
MCT MCT contains exact and heuristic algorithms for the Multiple Consensus Tree problem, which arises in summarizing the solution space of tumor phylogeny inference.
PhySigs PhySigs is an R package to solve the Tree-constrained Exposure inference problem to identify a small number of exposure shifts along the edges of a given tumor phylogeny. This problem arises in the analysis of the dynamics of mutational signature exposures in cancer, and has both therapeutic and basic science applications.
ClonArch ClonArch is a web-based tool to interactively visualize the phylogenetic tree and spatial distribution of clones in a single tumor mass. ClonArch uses the marching squares algorithm to draw closed boundaries representing the presence of a clone in a tumor.
PMH-S PMH-S is a fixed parameter tractable (FPT) algorithm for the Parsimonious Migration History problem for the case where the migration graph is restricted to a tree (i.e. $\mathcal{P} = \{S\}$). The asymptotic running time is exponential in the number $m$ of locations.
SPhyR SPhyR is an algorithm for reconstructing phylogenetic trees from single-cell DNA sequencing data. SPhyR employs the $k$-Dollo phylogeny model, where each single-nucleotide variant can only be gained once but lost $k$ times.
MACHINA MACHINA is a computational framework for inferring migration patterns between a primary tumor and metastases using DNA sequencing data.
SPRUCE SPRUCE reconstructs the tumor evolutionary history of a tumor from multi-sample bulk DNA sequencing data, incorporating both single-nucleotide variants and copy-number aberrations. SPRUCE employs a combinatorial exhaustive enumeration.
AncesTree AncesTree reconstructs the tumor evolutionary history of a tumor from multi-sample bulk DNA sequencing data. AncesTree incorporates single-nucleotide variants under the infinite sites assumption.

Infection genomics

TRIBAL TRIBAL infers B-cell clonal lineages and isotype transition probabilities given isotypes and BCR sequences of $n$ B cells clustered into $k$ clonotypes.
CORSID CORSID identifies transcription regulatory sequence (TRS) sites, the core sequence, and gene locations given an unannotated coronavirus genome.
JUMPER JUMPER infers transcripts $\mathcal{T}$ and their abundances $\mathbf{c}$ with maximum likelihood from a Nidovirales alignment of (paired-end) reads, solving the Discontinuous Transcript Assembly problem.
TiTUS TiTUS takes a timed pathogen phylogeny whose leaves are labeled by hosts along with epidemiological data, and counts and uniformly samples feasible interval vertex labelings that satisfy direct-transmission and weak-bottleneck constraints.
SharpTNI SharpTNI is a tool for counting and sampling solutions from the space of parsimonious transmission networks under a weak transmission bottleneck constraint. This problem arises in phylodynamic and phylogeographic analyses of disease outbreaks.

Network biology

eXamine eXamine is a Cytoscape 3.x app that displays set membership as contours on top of a node-link layout. Source code is available on Github.
xHeinz xHeinz identifies conserved dysregulated network modules given differential expression data from two organisms and their protein/gene interaction networks.
Heinz Heinz identifies dysregulated network modules given differential expression data and a protein/gene interaction network.
Natalie 2.0 Natalie 2.0 is a method for network alignment based on Lagrangian relaxation.
NatalieQ NatalieQ is a web server for protein-protein interaction network querying.

Miscellaneous

CSO The Crossing Schedule Optimization (CSO) problem arises in the context of marker-assisted plant breeding, where we are given a desired diploid genotype $C^*$ in the form of a $2 \times m$ binary matrix. In addition, we are given a set $\{C^1,\ldots,C^n\}$ of $n$ parental genotypes, an $m \times m$ recombination probability matrix $R$ and a desired probability $\gamma$ of success. The task is to identify a crossing schedule (DAG) with minimum depth, total population size, generations (depth) and crossings (internal vertices) that results in $C^*$ from $\{C^1,\ldots,C^n\}$.
Automated Topology Builder The ATB and Repository facilitate the development of molecular force fields for molecular dynamics or Monte Carlo simulations of biomolecular systems.