Effective Species Count

General Introduction

This web page calculates the effective species count for a user-supplied phylogenetic tree and user-supplied nucleotide substitution model.  The effective species count measures how efficiently sequences for the species in the leaves of the phylogenetic tree can be used to reconstruct the equilibrium distribution that governs each multiply aligned DNA sequence position.

When several species are very closely related the effective species count will be above, but very near, 1.0 because the high correlation between the species' sequences means that the information from any one of them is almost all there is to know.  When several species are distantly related the effective species count will be near the number of species, since each is essentially independent of the others.  This software provides solutions for the intermediate cases, where the extent of the correlation between species' sequences is not obvious.

Greedy, AntiGreedy, and AllOnly

The software allows the user to request greedy or antiGreedy instead of the default allOnly analysis.  From a set of start species, greedy will seek the additional species that most increases the effective species count and add it to the starting collection, and then repeat; one at a time finding more species that most increase the effective species count.  AntiGreedy instead finds, one at a time, the additional species that least increase the effective species count at each addition.  AllOnly reports the effective species count for the set of start species and the set of all species only.

Phylogenetic Tree

The phylogenetic tree should be supplied in Newick format (see, e.g., http://en.wikipedia.org/wiki/Newick_format) either as text directly, or as a file.  The species in the leaves of the phylogenetic tree should be named.

Nucleotide Substitution Model

Several nucleotide substitution models are supported.  They are:
  • Fel81 -- from Felsenstein (1981) J Mol Evol 17(6):368-376 (PubMed 7288891)
  • HKY85 -- from Hasegawa, Kishino, and Yano (1985) J Mol Evol 22(2):160-174 (PubMed 3934395)
  • HKY85Slow -- same cite as HKY85, but model is without normalizing rate constant
  • HB98 -- Halpern and Bruno (1998) Mol Biol Evol 15(7):910-917 (PubMed 9656490)
  • New05 -- Experimental -- Newberg (2005) Technical Report 05-08, Rensselaer Polytechnic Institute Department of Computer Science, Troy, NY.
  • New06 -- Experimental -- no cite available
Each of these requires a "foreground" nucleotide equilibrium probability distribution.  In addition the HB98, New05, and New06 models require specification of a "background" nucleotide substitution model that specifies the nucleotide substitution process in the absence of selection pressures.  The choices are
  • JC69 -- from Jukes and Cantor (1969) Mammalian Protein Metabolism 3:21-132, Academic Press.
  • Kim80 -- from Kimura (1980) J Mol Evol 16(2):111-120 (PubMed 7463489)
  • Fel81 -- from Felsenstein (1981) J Mol Evol 17(6):368-376 (PubMed 7288891)
  • HKY85 -- from Hasegawa, Kishino, and Yano (1985) J Mol Evol 22(2):160-174 (PubMed 3934395)
  • HKY85Slow -- same cite as HKY85, but model is without normalizing rate constant
If Fel81, HKY85, or HKY85Slow is chosen for the background nucleotide substitution model then a "background" nucleotide equilibrium probability distribution must be supplied.

If HKY85, HKY85Slow, or Kim80 is specified for either the foreground or background nucleotide substitution model then a transition-to-transversion ratio must be supplied.

Output Level

Brief output gives only the effective species counts.  Verbose output shows the program's attempts in finding a the best or worst species to add to the set of start species. Very verbose output gives additional information such as pairwise distances between the species, as well as output concerning alternatives for determining a nucleotide equilibrium probability distribution: (1) optimal sequence weights (see Newberg, McCue, and Lawrence, 2005, Stat Appl Genet Mol Biol 4:Article13, PubMed 16646830) and (2) equal sequence weights.



Email Address
Paste Newick description in text box and then press the Submit Newick button to proceed or browse and select a Newick file
Tree in Newick format:


Effective Species Count Help


Bayesian Bioinfomatics Applications

e-mail: CCMB