Models of gene content

A phylogenetic mixture model for gene family loss in parasitic bacteria. M. Spencer and A. Sangaralingam (2009). Molecular Biology and Evolution 26:1901-1908 (the authoritative version is here). Also has an appendix and an erratum.

Modelling prokaryote gene content M. Spencer, E. Susko, A. J. Roger (2006). Evolutionary Bioinformatics Online 2:165-186 (the authoritative version is here)

Phylogenetics workshop International Conference on Microbial Genomes 2005

Slides: modelling prokaryote gene content Mathematics of Evolution and Phylogeny 2005

Software

ML_genedata0.4.tgz Tar archive of C code ML_genedata. Implements mixture models for gain and loss of gene families on a phylogeny. Includes data originally from the COG database (Tatusov et al, 2003), downloaded 13 May 2004. See README for details.

Tree files

Tree files from A phylogenetic mixture model for gene family loss in parasitic bacteria. M. Spencer and A. Sangaralingam (Molecular Biology and Evolution, accepted):

16s tree, 16s edge lengths, estimated as described in the methods. Revised version, 19/2/10, see erratum.
16s tree, model F edge lengths, estimated as described in the methods. Revised version, 19/2/10, see erratum.
Conditioned logdet tree, model F edge lengths, estimated as described in the appendix.

Conditioned genome reconstruction

Conditioned genome reconstruction: how to avoid choosing the conditioning genome M. Spencer, D. Bryant, E. Susko (2007). Systematic Biology 56: 25-43 (the authoritative version is here).

Poster: Conditioned genome reconstruction: how to avoid choosing the conditioning genome M. Spencer, D. Bryant, E. Susko, CIAR Program in Evolutionary Biology Annual Meeting, September 2005 [out-of-date: the paper describes some improvements]

Software

bionj_cond1.0.tar.gz Tar archive of C code for bionj_cond. Improves the reliability of conditioned genome reconstruction (Lake and Rivera 2004, Rivera and Lake 2004). Combines information from all possible conditioning genomes, using a supertree method based on BIONJ (Gascuel 1997). See README for details.

cond_logdet0.3.tar.gz Tar archive of C code for cond_logdet. Estimate conditioned logdet distances for all choices of conditioning genome. Output files can be used as input for bionj_cond. See README for details.

Data

Spencer_etal_data.nex Nexus file containing gene family presence/absence data used in the manuscript. Originally from the COG database (Tatusov et al, 2003), downloaded 13 May 2004.

References

Gascuel O. (1997). BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Molecular Biology and Evolution, 14:685-695.

Lake, J. A. and Rivera, M. C. (2004). Deriving the genomic tree of life in the presence of horizontal gene transfer: conditioned reconstruction. Molecular Biology and Evolution 21: 681-690.

Rivera, M. C. and Lake, J. A. (2004). The ring of life provides evidence for a genome fusion origin of eukaryotes. Nature 431: 152-155.

Tatusov, R. L. et al (2003). The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4:41.



Home