1) Directories


b_m -> Random iterative algorithm.

    b_m_1      : Random iterative algorithm with single-type partitioning.

    b_m_1_as   : Random iterative algorithm with single-type partitioning.
                 The iteration terminates when all possible partitions give 
                 no improvement. 

    b_m_2      : Random iterative algorithm with double-type partitioning.

    b_m_2_as   : Random iterative algorithm with double-type partitioning.
                 The iteration terminates when all possible partitions give 
                 no improvement. 

    b_m_all    : Random iterative algorithm with random partitioning.

    b_m_all_as : Random iterative algorithm with random partitioning.
                 The iteration terminates when all possible partitions give 
                 no improvement. 

    b_m_t      : Random iterative algorithm with tree-dependent partitioning.

    b_m_t_as   : Random iterative algorithm with tree-dependent partitioning.
                 The iteration terminates when all possible partitions give 
                 no improvement. 


ia -> Best-first iterative algorithm.

   pia_1 : Best-first iterative algorithm with single-type partitioning.
   pia_2 : Best-first iterative algorithm with double-type partitioning.
   pia_t : Best-first iterative algorithm with tree-dependent partitioning.
   ria   : Round-robin iterative algorithm.


tia -> Tree-based iterative algorithm for Protein.

    apdp       : Similarity between each pair of sequences is estimated 
                 with its pairwise alignment score obtained by DP.

    mktree     : Using a matrix of the similarity scores, UPGMA method 
                 constructs a guided tree.

    tree_based : Tree-based algorithm.
    tria_i1    : Tree-based round-robin algorithm.
    tria       : Tree-based round-robin iterative algorithm.

    tpia_1     : Tree-based best-first iterative algorithm 
                 with single-type partitioning.

    tpia_2     : Tree-based best-first iterative algorithm 
                 with double-type partitioning.

    tpia_t     : Tree-based best-first Iterative algorithm 
                 with tree-dependent partitioning.

    tbm1as     : Tree-based random iterative algorithm 
                 with single-type partitioning.

    tbm2as     : Tree-based random iterative algorithm 
                 with double-type partitioning.

    tbmtas     : Tree-based random iterative algorithm 
                 with tree-dependent partitioning.


dna -> Tree-based iterative algorithm for DNA.

    dapdp   : Similarity between each pair of sequences is estimated 
              with its pairwise alignment score obtained by DP.

    dmktree : Using a matrix of the similarity scores, UPGMA method 
              constructs a guided tree.

    dtria   : Tree-based round-robin iterative algorithm.

    dtpia_1 : Tree-based best-first iterative algorithm 
              with single-type partitioning.

    dtpia_2 : Tree-based best-first iterative algorithm 
              with double-type partitioning.

    dtpia_t : Tree-based best-first Iterative algorithm 
              with tree-dependent partitioning.


gdp -> group-to-group two dimensions Dynamic Programing.


Module -> functions.



2) Algorithms


Tree-based algorithm

Various tree-based algorithms of multiple sequence alignment have been
devised.  Among them, we choose a typical algorithm to evaluate the
performance of tree-based algorithms. A tree-based algorithm uses
2-way dynamic programming (DP) in a group-to-group manner (Barton,
1990) to align two sub-alignments.

In this algorithm, similarity between each pair of sequences is
first estimated with its pairwise alignment score obtained by DP.
Using a matrix of the similarity scores, UPGMA method (Snearth and
Sokal, 1973) constructs a guided tree. Sequences are merged to form a
multiple alignment based on the bottom-up branching order of the
guided tree. Each node of the tree shows two bunches of sequences to
which group-to-group DP is applied.

The group-to-group DP optimizes the alignment between groups.  The
score to be optimized is the summation of all pairwise alignment
scores between the groups.  The pairwise alignment score is derived
from a similarity value between amino acids and a linear relation of
gap penalty: a+bk where k is the length of gap and a and b are the
opening and extending gap cost. The optimizing operation in DP is the
same as Algorithm C, explained in detail by Gotoh (1993).  In the
other algorithms described below, the same type of DP is used to align
two sub-alignments.


Round-robin iterative algorithm

Barton and Sternberg (1987) proposed the simplest iterative
improvement concept for achieving refinement against a resulting
alignment obtained by a tree-based algorithm. In the method,
group-to-group DP realigns each sequence against the whole alignment,
except for the current sequence. This process is repeated in a
round-robin manner.

A round-robin iterative algorithm applies the refinement method to an
initial arbitrary state of multiple alignment: normally there are no
gaps in the sequences to be aligned. Accordingly, sequence S1 is
aligned with the alignment of sequences S2...Sn (having first removed
any gaps that are common to S2...Sn). S2 is then realigned with the
alignment of S1, S3...Sn. This process is repeated until Sn has been
realigned with S1, S2...Sn-1. The complete cycle is repeated until no
change occurs.


Random iterative algorithm

The original iterative improvement algorithm starting from a no-gap
alignment was found by Berger and Munson (1991). Random numbers play
the following important role in the iterative algorithm.

First, an initial N sequence alignment is input into an iteration
cycle.  The sequences are divided by random numbers into two groups: a
k sequence alignment and an N-k sequence alignment. The two partial
alignments are then recombined by group-to-group DP. Since the score
of the resulting alignment is always better than or equal to the
previous one, the new alignment is set at the starting point of the
next iteration cycle. In this way, application of the iteration cycle
gradually improves the whole alignment. The iteration terminates when
all possible partitions give no improvement. The quality of the final
result depends mainly on how effective partitions have been tested in
the iteration cycles.

The random iterative algorithm requires a huge number of iteration
cycles to solve a practical problem. N-sequence alignment has
2 N-1 - 1 ways of partitioning: more than 2,000,000 partitions
when N=22. To be practicable, a heuristic technique is needed to
significantly restrict search space and reduce execution time.  We
studied three restricted partitioning techniques: single-type
partitioning, double-type partitioning, and tree-dependent
partitioning.

single-type partitioning: The number of sequences in the smaller
   sub-alignment of partitioning is restricted to one, while the other
   sub-alignment has N-1 sequences when the number of aligned sequences
   is N.  Since the number of possible partitions is N, the order of
   partitioning complexity is reduced from 2 N to N with this
   partitioning technique.

double-type partitioning: The number of sequences in the smaller
   sub-alignment of partitioning is restricted to one or two, while the
   other sub-alignment has N-1 or N-2 sequences. Possible partitions are
   N(N+1)/2. The order of partitioning complexity N 2 is bigger than that
   of single-type partitioning.

tree-dependent partitioning: Partitioning is restricted to the ways
   indicated by branches of a guided tree. Branch separations are
   2N-3 when the number of sequences is N (Allison et al., 1992).
   Construction of the guided tree is based on a current multiple
   alignment at the beginning of each iteration cycle (Figure 1). This
   technique adequately considers the similarity of aligned sequences. 
   Although this partitioning technique requires overhead for
   constructing the guided trees, the order of partitioning complexity is
   the same N as that of the single-type partitioning.

The three techniques were incorporated in a random iterative
algorithm. In the iteration cycle, random numbers are used to select
each possible partition at the same probability. These techniques
allow the iterative algorithm to solve a practical multiple alignment
problem.


Best-first iterative algorithm

The random iterative algorithm selects a partition randomly, whereas
the best-first iterative algorithm tests all possible partitions in
each iteration cycle and selects the best alignment (Figure 1).
Restricted partitioning techniques are also required in the algorithm
to solve practical problems.


Iterative improvement after tree-based alignment

This algorithm is a simple combination of the tree-based algorithm and
an iterative algorithm.  Alignment obtained by the tree-based
algorithm is refined by an iterative algorithm.


Tree-based iterative algorithm

The tree-based iterative algorithm consists of the iterative
improvement strategy and the tree-based algorithm (Figure 2). Each
alignment is refined by an iterative algorithm, just after the two
sub-alignments are merged in a tree-based way (Subbiah and Harrison,
1989). The search schemes, such as random and best-first, bring
variety to the tree-based iterative algorithm. Restricted partitioning
techniques can reduce execution time.


