!
! This is a sample prior file for use with the module sampler. It is essentially
! the same as the one used in Decoding Human Regulatory Circuits
!
! The command line that we used with the module sampler is:
! /home/thompson/proj-bern/mpcluster/Gibbs.mpi.x86 10,10,8,10,10
! -E 9 -n -m -D -C 0.01 -f 0 -S 50 -p 100
! -P -B -o -X 2,5,1,75000 -K
!
! The -X option is used for simulated tempering. It only works with the MPI version
! of Gibbs. -K implements the sampling method for k, the number of sites/seq, described
! in the supplemental text. This parameter seems to improve the MAP solution in
! some cases. Leaving it off, causes Gibbs to do an different inference on k before sampling.
!
! Information on obtaining the modular sampler may be found at
! http://bayesweb.wadsworth.org/gibbs/gibbs.html
! If you desire an MPI version, please include that in your request and please specify
! the platform you will be using.
!
! If you have questions about this file or general Gibbs questions, please contact
! me at thompson@wadsworth.org
!
! Bill Thompson
! Motif model priors - these are uniform for each of the 5 models
! Since the default prior weight for Gibbs is 0.1, this prior model works
! out to be 1 pseudocount for each motif position
! The order of the columns is A T C G.
>PSEUDO 1
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
>
>PSEUDO 2
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
>
>PSEUDO 3
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
>
>PSEUDO 4
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
>
>PSEUDO 5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2.5
>
! The spacing model - 100 represents the inter-site distance from the end of
! a site to the start of the following site
>FLAT
100
>
! The number of sequences followed by the expected number of sites for each motif type
>Seq
48 36 24 26 14 12
>
! The probability of the number of sites. I.e. the probability of 0 sites, 1 site ...kmax sites
! This is normalized by the program so you can specify it as counts and the program will do the normalization.
! The number following >BLOCK is a weight, the default is 0.1
! Note - changing the weight will affect the MAP.
>BLOCK 1
0.1 8 4 12 16 6 0.1 0.1 2 0.1
>
! Prior transition, starting and ending probailities. These do not have to be normalized.
! The transition matrix should be (no. of motifs) x (no. of motifs).
! The number after >TRANS are update and weight. Update should be 1 if you want Gibbs
! to calculate the transition probabilites, weight defaults to 0.8.
! Trans[t][t1] is the prior probability of a site of type t following
! a site of type t1. Thus, Trans[2][1] is the probability that a site of motif type 2
! will follow a site of motif type 1.
! In analyzing the human-mouse sequences, we didn't make any prior assumptions about
! order, hence the uniform priors.
>TRANS 0.1 1
0.2 0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2 0.2
>
! Starting and ending probabilities. Begin[3] is the prior probability that the
! first site in the sequence is type 3.
>BEGIN
0.2 0.2 0.2 0.2 0.2
>
>END
0.2 0.2 0.2 0.2 0.2
>
>COMMENT
Sample human-mouse prior file
uniform priors
5 models
weight on prior prob of sites/seq = 1
max sites/seq = 9
>
! The following options affect sampling. They may be useful to speed convergence
! They are included for completeness. They are heuristics.
! remove the ! to use them.
! This parameter weights the alignment count for inference on the number of
! sites/sequence. It's not very useful because it's hard to estimate what the
! correct weights should be. However, it can be useful in certain circumstances when Gibbs
! seems to be overestimating the number of sites.
!>ALIGN
!1 1E+3 1E+3 1E+4 1E+4 1E+5 1E+5 1E+4
!>
! Gibbs has 2 modes of inference on the number of sites in a sequnces. The method
! described in the supplement is turned on when the MAP > 0. These parameters
! control that. Setting them low like in this example, may speed convergence at
! the risk of Gibbs getting stuck at a negative MAP
!>KSAMPLEMAP
!-1000000
!>
!>MINSITEMAP
!-100 0
!>
! A sample Poisson distribution on the number of sites/seq.
! A poisson distribution is a good estimate if you don't have any other
! information on the distribution of sites in the data.
! Poisson distribution - lambda = 3.5
!>BLOCK 1
!0.30197383 1.05690842 1.84958973 2.15785469 1.88812285 1.32168600 0.77098350 0.38549175 0.16865264 0.06558714
!>