If you are running the Gibbs sampler locally, we suggest that periodically check for updates and new version releases. Below are descriptions of a few specific changes to the Gibbs sampler that affect the web tutorial examples, and an indication of the versions when these changes occurred.

Previous versions (up to version 2.05) treated even width and odd width palindromes separately. An odd width palindrome was specified by an odd motif width (*e.g.* 17 positions), with the center position in the model "on" but unpaired in the palindromic model. In more recent versions of Gibbs (versions 2.06 and higher) it is NOT necessary to specify even and odd palindromes separately.

The Centroid solution option is available with versions 3.0 and higher.

The Gibbs sampler currently (as of version 3.0) allows four sampling modes: site sampling, motif sampling, recursive sampling, and centriod sampling. These sampling modes were developed and implemented over time, and as listed, represent increasing levels of sophistication. Here we provide brief descriptions of appropriate uses for each of these sampling modes. For a more thorough description of the site, motif, and recursive sampling modes, and their use on the Gibbs web server, see our chapter in Current Protocols in Bioinformatics. For a more thorough description of the recursive and centroid sampling modes, and their use on the web server and at the command line, see our on-line tutorial on analysis of co-expression data.

The Site Sampling mode was originally described in our first paper on Gibbs sampling for biological sequences (Lawrence et al., 1993). In this mode, the sampler will identify exactly one site per input sequence for a predicted motif. Given this restriction, the site sampling mode is not suitable for analysis of the types of high-throughput transcriptomics data that are being generated these days. However, we continue to find site sampling useful for very specific cases.

Site sampling is appropriate when the input data consist of sequences for which you have a reasonable expectation that each sequence has one binding site for the transcription factor. Specifically, we use site sampling to analyze sequence data from DNaseI footprinting or EMSA (electrophoretic mobility shift assay) experiments. For example, a site sampling run for the seven DNaseI footprints of the *E. coli* PhoP transcription factor produces the following results.

Site sampling is invoked by:

- clicking the "Site Sampler" radio button the the Gibbs Sampler web page, and:
- providing the sequences in fasta format
- indicating the number of motifs to search for, in the "No. of different motifs (patterns)" field
- providing a number for the "Motif Width(s)" field
- (note: leave blank the "Est. total sites for each motif type" field)
- (note: leave blank the "Max sites per seq" field)

- providing at the Gibbs Sampler command-line:
- a fasta file containing the sequences
- an integer number for the motif width
- (note:
**do not**provide an estimate of the number of sites, and**do not**use the "-E" option to indicate the upper limit of number of sites per sequence)

The Motif Sampling mode was one of the first extensions to the Gibbs sampler, and was described in (Neuwald et al., 1995). In this mode, the sampler will identify anywhere between zero sites and the maximum possible number of sites per sequence (*e.g.*: a 50 base sequence could maximally have 5 non-overlapping 10-mer sites). This allows the sampler quite a lot of freedom. This sampling mode is generally not as sensitive as recursive sampling or centroid sampling, but because it is less compute-intensive, it can be useful during initial, exploratory motif discovery tasks.

Motif sampling is appropriate when you have a reasonable expectation that the input sequences contain a common motif, although you are not certain that each sequence contains a site for the motif, and some sequences may contain multiple sites for the motif, but it is difficult to estimate an upper limit on that number of sites per sequence. One example in which we have found motif sampling useful is for exploring bacterial genomes (specifically, the extracted intergenic sequences of a bacterial genome) for possible repetitive sequences. For example, a motif sampling run on all of the intergenic regions of the *E. coli* genome identifies the REP element (Rudd, K.E. (1998)).

Note in the example output file, that at the command line we estimated a total of 100 motif sites. This is an estimate, and simply directs the sampler to initiate the motif search by selecting 100 sites at random to build the initial model. The sampler, in fact, found many more than 100 sites for the strong REP motif. It is also important to note that we use this type of analysis only in an exploratory manner. We do not use the output as definitive descriptions of repetitive elements.

Motif sampling is invoked by:

- clicking the "Motif Sampler" radio button on the Gibbs Sampler web page, and:
- providing the sequences in fasta format
- indicating the number of motifs to search for, in the "No. of different motifs (patterns)" field
- providing a number for the "Motif Width(s)" field
- providing a number in the "Est. total sites for each motif type" field
- (note: leave blank the "Max sites per seq" field)

- providing at the Gibbs Sampler command-line:
- a fasta file containing the sequences
- an integer number for the motif width
- an integer that provides an estimate of the total number of motif sites
- (note:
**do not**use the "-E" option to indicate the upper limit of number of sites per sequence)

The Recursive Sampling mode implements a more advanced sampling algorithm than that previously used, and was described in (Thompson et al., 2003). In this mode, the sampler will identify between zero sites and a maximum number of sites per sequence that is set by the user. This sampling mode is more compute-intensive than site sampling or motif sampling, but is typically more sensitive, and thus is currently the default mode for running the Gibbs Sampler on the web server.

Recursive sampling is appropriate when you have a reasonable expectation that the input sequences contain a common motif, and you can reasonably estimate an upper limit on the number of sites per sequence. We have used this sampling mode extensively for phylogenetic footprinting and analysis of co-expressed genes (*e.g.*: Conlan et al., 2005 and Wan et al., 2004), and examples that use recursive sampling can be found on our tutorial pages.

Recursive sampling is invoked by:

- clicking the "Recursive Sampler" radio button on the Gibbs Sampler web page, and:
- providing the sequences in fasta format
- indicating the number of motifs to search for, in the "No. of different motifs (patterns)" field
- providing a number for the "Motif Width(s)" field
- providing a number in the "Max sites per seq" field
- optional: providing a number in the "Est. total sites for each motif type" field

- providing at the Gibbs Sampler command-line:
- a fasta file containing the sequences
- an integer number for the motif width
- optional: an integer that provides an estimate of the total number of motif sites
- using the "-E
*m*" option to indicate the upper limit (*m*) of number of sites per sequence

The Centroid Sampling mode is a modification of the Recursive Sampling mode. Similar to recursive sampling, in this mode, the sampler will identify between zero sites and the maximum number of sites per sequence set by the user. Centroid sampling is the most recently developed sampling mode, and represents a significant departure from previous approaches, in that the algorithm does not search for an optimal solution, *i.e.*, one that maximizes a motif score. Instead, in the centroid sampling mode, the algorithm provides a centroid motif solution, which is that alignment of sites that has the minimum total distance to the set of alignments sampled from the *a posteriori* probability distribution of alignments.

Centroid sampling is appropriate when you have a reasonable expectation that the input sequences contain a common motif, and you can reasonably estimate an upper limit on the number of sites per sequence. Examples using the centroid sampler can be found on our tutorial pages.

The Centroid Sampler is described in Thompson et al. (2007). The centroid sampler also allows the incorporation of a full phylogenetic model as described in Newberg et al. (2007).

Centroid sampling is invoked by:

- clicking the "Centroid Sampler" radio button on the Gibbs Sampler web page, and:
- providing the sequences in fasta format
- indicating the number of motifs to search for, in the "No. of different motifs (patterns)" field
- providing a number for the "Motif Width(s)" field
- providing a number in the "Max sites per seq" field
- optional: providing a number in the "Est. total sites for each motif type" field

- providing at the Gibbs Sampler command-line:
- a fasta file containing the sequences
- an integer number for the motif width
- optional: an integer that provides an estimate of the total number of motif sites
- using the "-E
*m*" option to indicate the upper limit (*m*) of number of sites per sequence - using the options "-m -nopt -y" to disable maximal map, near optimal map, and frequency sampling, respectively
- using the "-bayes
*x,y*" option to specify the number of burn-in (*x*) and sampling (*y*) iterations - using the "-align_centroid" option to cause the sampler to align the centroid sites, thus providing a motif matrix