We assume that the length of the sequence alignment, denoted , and the position of the root of the phylogeny.For now, let us assume that the data take the form of a bifurcating rooted phylogeny with branch lengths in units of substitutions per site and that all tip dates are known.

We build on recent advances in maximum likelihood and least-squares phylogenetic and molecular clock dating methods to develop a fast relaxed-clock method based on a Gamma-Poisson mixture model of substitution rates.We estimate confidence intervals for rates, dates, and tip dates using parametric and non-parametric bootstrap approaches.This method is implemented as an open-source R package, Pathogen sequence data can provide important information about the timing and spread of infectious diseases, particularly for rapidly evolving pathogens such as RNA viruses.This algorithm can be repeated for multiple starting conditions of the initial substitution rate to improve the quality of the estimate. The slope of the regression line is an estimate of the mean rate of substitution per unit time where the correlation due to shared ancestry has been neglected.This approach is implemented in the software good candidates for the root position.In many real applications, dates of lineage sampling may not be known with certainty.

Sometimes, the exact sampling time is not known; it may be missing from the annotations, or recorded to a particular precision (e.g. Given an initial guess of tip dates model is optimized heuristically, it is challenging to apply standard likelihood based approaches such as profiling to estimate confidence intervals.

Using simulated data, we demonstrate that explicit incorporation of a relaxed clock leads to more accurate inference of the mean rate of evolution in addition to providing information on the variation in evolutionary rates.

Our implementation generates confidence intervals for the evolutionary rate and the time to the most recent common ancestor using parametric bootstrapping (PB), which lends itself well to parallelization. a strict) molecular clock, as advised by Duchene et al.

Molecular clock models relate observed genetic diversity to calendar time, enabling estimation of times of common ancestry.

Many large datasets of fast-evolving viruses are not well fitted by molecular clock models that assume a constant substitution rate through time, and more flexible relaxed clock models are required for robust inference of rates and dates.

2015), and the latest version of the least squares dating (LSD) software also includes PB routines (To et al. In addition to running on multiple bootstrapped phylogenies, Monte Carlo simulation and PB approaches offer a highly flexible and parallelizable approach for estimating uncertainty in substitution rates and node dates (Efron and Tibshirani 1994).

