Example of mcmctree analysis
〜Beginning guide〜 |
|
 |
April 23, 2014
Jun Inoue
|
|
MCMCTREE is a phylogenetic program for Bayesian estimation of species divergence times using soft fossil constraints under various molecular clock models. This is part of the PAML package. In this tutorial I will analyze an easy example modified from dataset of Inoue et al. (2010). Here we conduct a commonly used time estimation method, "Approximate Likelihood Method." for the datasets including more than 10 species.
Download an example and programs from mcmctreeExampleVert6.tar.gz (27MB).For Windows, you can decompress the downloaded file by right click (For Mac, just double click). In this tutorial,we will use example files in mcmctreeExampleVert6/examples directory. Here is an explanation by slide.
I am assuming that you have some basic knowledge of command line in Windows or Unix system in Linux/MacOS.
Make sure you are using the latest version of PAML. Now I am using 4.8 [April 2014].
Japanese language version is here.
|
|
The approximate likelihood method consists of 2 steps (3 steps including preparation). We conduct each analysis as different job.
0. Rough Estimation of Substitution Rate
1. Estimation of Gradient and Hessian of Branch Lengths
2. Estimation of Time and Rate by MCMC Analysis
Caution: For Windows, sometimes we need to change "attribute" of example directory by right click. Then unlock "read only".
|
|
Download and install the following programs.
Textile manipulation (Editor)
[Windows]
.
[Mac]
TEXTWRANGLER.
Tree drawing
[Windows, Mac]
FigTree
Time estimation
Download PAML from http://abacus.gene.ucl.ac.uk/software/paml.html .
[Windows]
Just copy baseml.exe and mcmctree.exe from pamlXX/bin directory to mcmctreeExampleVert6/examples directory.Windows user does not need to compile programs.
[Mac]
Decompress pamlX.X.tar.gz file.From terminal, move into the pamlXX/src directory from terminal and compile programs as follows:
Type,
make -f Makefile
Then you will obtain programs such as baseml, codeml, mcmctree.Copy baseml and mcmctree to mcmctreeExampleVert6/examples directory.
|
|
Open the following files by your own editor (notepad, TextWrangler etc.). Unlock files if they are "read only" by right click (Win) or command+I (Mac).
baseml.ctl
Control file for baseml.
mcmctree.ctl
Control file for mcmctree.
Vertebrate6.phy
Sequence file. This file contains a nucleotide alignment of 6 vertebrate species.
Vertebrate6PE.tre and Vertebrate6.tre
Tree file.This file contains constrained tree as the Newick format.
[Windows & Mac]
Make sure the topology and constraints of Vertebrate6PE.tre and Vertebrate6.tre (Newick format trees) using FigTree. You can draw the trees as follows:
|

|
0. Rough Estimation of Substitution Rate |
|
We will estimate overall substitution rate in this dataset. This estimates will be used in rgene_gamma setting in mcmctree.ctl later.
Move into the mcmctreeExampleVert6/examples directory by command prompt (Win) or terminal (Mac).
Infiles
Open baseml.ctl by your editor.Sequence file (Vertebrate6.phy) and tree file (Vertebrate6PE.tre) names are written.GTR+G model (model = 7) will be used for rate estimation.Strict clock model (clock = 1) is used because of quick estimation .
Execution of BASEML
[Windows]
From command prompt, type,
> baseml baseml.ctl
[Mac]
From terminal, type,
> ./baseml baseml.ctl
.
If the control file is named as baseml.ctl, baseml read it automatically without specifying.
Result
Result will be saved in mlb file.You can find the following result in mlb file:
Substitution rate is per time unit
0.084825 +- 0.008316
The exact value would differ depending on the program version. This value will be used in mcmctree.ctl file.
|

|
1. Estimation of Gradient and Hessian of Branch Lengths |
|
The branch lengths are estimated by maximum likelihood together with the gradient and Hessian (i.e. vector of first derivatives and matrix of second derivatives) of the likelihood function at the maximum likelihood estimates.
Infiles
Open mcmctree.ctl file by your editor and make sure the following line:
usedata = 3
This option is for the estimation of the maximum likelihood estimates (MLE) of branch lengths, gradient (g), and Hessian (H).
Execution of MCMCTREE
Type,
> mcmctree
Result
The estimated gradient and Hessian will be saved in out.BV file.
|

|
2. Estimation of Time and Rate by MCMC Analysis |
|
We will estimate divergence times by Bayesian analysis.
Infiles
Change the name "out.BV" to "in.BV".
Open mcmctree.ctl file and change the usedata option as follows:
usedata = 2
This option will start the MCMC analysis using Approximate Likelihood Calculation. Also change the name of output file as follows:
outfile = out_usedata2
rgene_gamma
rgene_gamma specifies the prior distribution of overall substitution rate μ with assuming the gamma distribution. Here we need to specify the shape and scale parameters of the gamma distribution.
The gamma distribution is described by the shape parameter (α) and scale parameter (β).Let m be the mean and s be the standard deviation of the gamma distribution. These are related as follows:
a = (m/s)^2
b = m/s^2.
For example, μ is 2/2 =1,then it means 1 change/site/time unit.If time unit is 100 MY,the overall substitution rate is 10^-8 substitution/site/year.
According to the rough estimation using BASEML (see above), we will use m = 0.08 and s = 0.08.The shape parameter α is
α = (0.08/0.08)^2
= 1.
And the scale parameter is
β = 0.08/0.08^2
=12.5.
So in the control file, we specify rgene_gamma as follows.
rgene_gamma = 1 12.5
sigma2_gamma
sigma2_gamma specifies how variable the rates are across branches. We specify the shape and scale parameters (α and β) in the gamma prior for parameter σ^2. In the control file, we set
sigma2_gamma = 1 4.5 .
See the PAML manual for detail.
Execution of MCMCTREE
> mcmctree
After you start MCMCTREE, check acceptance proportion on your screen.Ideally they ranged around 30% (20〜40% is better, 15〜70% is acceptable).
finetune = 1: 0.06 0.5 0.008 0.12 0.4 * times, rates, mixing, paras, RateParas
If the acceptance proportion are not stable, change "finetune = 1" to "finetune = 0" and modify the finetune values manually.In this case, you need to quit analysis using "Ctl+C". See the manual for detail.
Result
The result will be saved in out_usedata2 file.You can find the estimated time and 95% confidence interval for each node.
|

|
|
FigTree.tre (output file of mcmctree program) can be used for the infile of FigTree.
|

|
↓ Add constraints by graphics editor (e.g., Illustrator). |

|
|
For more advanced analysis , see the following instruction.
(A step-by-step manual for MCMCtree with an example dataset).
|
|
|