**Objective:** Measuring the statistical significance of
extreme sequence alignment scores is key to many important
applications, but it is difficult. To precisely approximate alignment
score significance, we draw random samples directly from a well
chosen, importance-sampling probability distribution.

**Inputs and Run Time:** This server allows application
of our technique to pairwise sequence alignment of nucleic acid and
amino acid sequences. To keep the run-time short, the precision for
larger sequence lengths will not be as good as that for shorter
sequence lengths. Specifically, runtime is approximately
*O(mns)*, where *m* and *n* are the sequence
lengths and *s* is the number of samples. The run-time is
capped at *mns* = 1 × 10^{9}, approximately 1
hour, by reducing the number of samples as necessary. Please examine
the output to discover how many samples were permitted.

Default values are 1000 samples for a 40 × 40 local sequence alignment of amino acid sequences using the BLOSUM62 scoring matrix, SWISSPROT residue frequencies, an insertion start score of -12, and an insertion extension cost of -1; this set of values has a run-time of about 10 seconds.

**Outputs and Temperature:** The temperature that
parameterizes the importance sampling distribution will be chosen so
that approximately half the generated importance samples contribute a
non-zero value to the importance sampling sum that determines the
*p*-value of the specified target score. The server provides
*p*-values for scores near the target score.

To find the *p*-value that interests you, find the row
beginning with the text "RESULT" followed by your score of interest,
and read across the line to discover the estimate of the score's
*p*-value, as well as various statistics to help you evaluate
your confidence in the estimate. If the number of samples is not
large enough, there may be some scores for which no *p*-value
is computed. In this case, the mathematics indicates use of the
*p*-value for the first higher score with an available
*p*-value, but instead we recommend rerunning the simulation
with more samples or a more appropriate target score.

Be aware that the *p*-value estimates for scores outside the
central range of the displayed scores can be imprecise. Also note
that some of the listed *p*-values in the 3^{rd} output
column will underflow double-precision floating point numbers; to
avoid this problem it may behoove you to manipulate the logarithm
(base 10) in the 6^{th} output column.

**Citing:** If you use this technique or this server in
your work, in your publications please cite:

Lee A. Newberg (2008) Significance of gapped sequence alignments.

J Comput Biol,15(9), 1187-1194. doi: 10.1089/cmb.2008.0125. http://ccmbweb.ccv.brown.edu/align_significance.html.