Skip to main content

Heuristics for multiobjective multiple sequence alignment

Abstract

Background

Aligning multiple sequences arises in many tasks in Bioinformatics. However, the alignments produced by the current software packages are highly dependent on the parameters setting, such as the relative importance of opening gaps with respect to the increase of similarity. Choosing only one parameter setting may provide an undesirable bias in further steps of the analysis and give too simplistic interpretations. In this work, we reformulate multiple sequence alignment from a multiobjective point of view. The goal is to generate several sequence alignments that represent a trade-off between maximizing the substitution score and minimizing the number of indels/gaps in the sum-of-pairs score function. This trade-off gives to the practitioner further information about the similarity of the sequences, from which she could analyse and choose the most plausible alignment.

Methods

We introduce several heuristic approaches, based on local search procedures, that compute a set of sequence alignments, which are representative of the trade-off between the two objectives (substitution score and indels). Several algorithm design options are discussed and analysed, with particular emphasis on the influence of the starting alignment and neighborhood search definitions on the overall performance. A perturbation technique is proposed to improve the local search, which provides a wide range of high-quality alignments.

Results and conclusions

The proposed approach is tested experimentally on a wide range of instances. We performed several experiments with sequences obtained from the benchmark database BAliBASE 3.0. To evaluate the quality of the results, we calculate the hypervolume indicator of the set of score vectors returned by the algorithms. The results obtained allow us to identify reasonably good choices of parameters for our approach. Further, we compared our method in terms of correctly aligned pairs ratio and columns correctly aligned ratio with respect to reference alignments. Experimental results show that our approaches can obtain better results than TCoffee and Clustal Omega in terms of the first ratio.

Background

Multiple sequence alignment (MSA) is of central importance to bioinformatics. This technique is useful to compare new sequences with other genomic sequences, unveiling their shared information and their significant differences. MSA methods are mainly essential to analyse biological sequences and to design applications in structure modeling, functional prediction, phylogenetic analysis and sequence database searching [1]. Currently, these approaches are also used in applications, for example, to compare protein structures, to predict protein mutations and interactions or to reconstruct phylogenetic trees [2, 3]. Moreover, these tools have found their place in medicine as well, mainly in the context of genetic screening and genetic engineering [4].

Most of the known alignment approaches have diverse optimization functions, along with assorted heuristics to search for the optimum alignment. These techniques consider a weighted sum formulation that maximizes the substitution score and penalizes indels/gaps. This is the usual procedure even in the case of pairwise sequence alignment [5]. However, the way of setting up the weights is very often not trivial. Moreover, analyzing only one alignment may lead to too simplistic interpretations of the data [6].

A multiobjective formulation of sequence alignment provides the practitioner a set of alignments that represents the trade-off between decreasing the number of gaps and increasing similarity. In bioinformatics, this formulation and algorithms can be found already for pairwise sequence (DNA/Protein) alignment  [710]. Abbasi et al. [7] present dynamic programming algorithms to compute the optimal set of alignments by treating the number of indels/gaps and the scores for (mis)matches/substitution as separate objectives. They also apply this method to analyze the construction of phylogenetic trees. Taneda [11] describes a heuristic approach for pairwise RNA sequence alignment that incorporates RNA structure information to approximate a set of optimal alignments. Schnattinger et al. [12] extend the work of Taneda by computing the optimal set. They treat the sequence alignment and the consensus structure calculation as separate objectives and solve both problems simultaneously with a dynamic programming approach. An extensive review about other problems in bioinformatics that are formulated as multiobjective optimization problems is explained in Handl et al. [13].

Few work has been done on multiobjective MSA (MMSA), which is much harder to be solved from a computational point of view. Only very recently, MSA has been treated as multiobjective optimization problem. Ortuño et al.  [14] used a multiobjective evolutionary algorithm based on the NSGA-II to optimize sum-of-pairs score, total columns and number of gaps. In another article [15], the authors extended the previous work by applying further biological features and considering different objectives such as strike score, non-gaps percentage and totally conserved columns. Likewise, Soto et al.  [16] applied a multiobjective evolutionary algorithm to optimize pre-aligned sequences by considering entropy and the metric Metal.

In this article, we propose to approach MMSA with heuristic methods, extending the formulation given in Abbasi et al. [7] for an arbitrary number of sequences. The approach considers the sum-of-pairs score vector by maximizing the substitution score, based on a substitution matrix, and minimizing the number of indels/gaps. This article is an extended version of a conference article [17] providing a more thorough experimental analysis of the algorithms proposed.

Methods

We first give a definition of multiple sequence alignment, followed by its multiobjective counterpart. Then, a local search strategy is proposed.

Multiple sequence alignment

In MSA, homogeneous characters of a group of sequences are aligned together in columns. The following definition introduces the problem of MSA in a more mathematical form [18].

Definition 1

Multiple sequence alignment. Let \(A_1 = (a_{1,1}, \ldots , a_{1,n_1} ), \ldots , A_m = (a_{m,1} ,\ldots , a_{m,n_m})\) be m strings over an alphabet \(\Sigma\). Let \('-' \not \in \Sigma\) be an indel symbol, let \(\Sigma ^\prime = \Sigma \cup \{-\}\) and let \(\lambda\) denote the empty string. Let \(h : (\Sigma ^\prime )^* \mapsto \Sigma ^*\) be a homomorphism defined by \(h(a) = a\) for all \(a \in \Sigma\), and \(h(-) = \lambda\). A multiple sequence alignment \(\phi\) of \((A_1 , \ldots , A_m )\) is a m-tuple of \(B_1= (b_{1,1}, \ldots , b_{1,\ell }), \ldots , B_m = (b_{m,1}, \ldots , b_{m,\ell })\) of strings of length \(\ell \ge \max \{A_i |1 \le i \le m\}\) over the alphabet \(\Sigma ^\prime\), such that the following conditions are satisfied:

  1. (i)

    \(|B_1 |= |B_2 |= \ldots = |B_m |\)

  2. (ii)

    \(h(B_i ) = A_i\) for all \(i \in \{1, \ldots , m\}\)

  3. (iii)

    For all \(j \in \{1, \ldots , \ell \}\) there exists an \(i \in \{1, \ldots , m\}\) such that \(b_{i,j} \ne '-'\).

A way of evaluating the quality of an alignment \(\phi\) is to score its columns by the sum-of-pairs score function. The score of a column \(j=1,\ldots ,\ell\) is defined as:

$$\begin{aligned} S(j) = \sum _{1\le i < k \le m} sc(b_{i,j}, b_{k,j}) \end{aligned}$$

where the score \(sc(b_{i,j}, b_{k,j})\), for \(b_{i,j}, b_{k,j} \in \Sigma\), comes from a substitution matrix used for scoring pairwise sequence alignments (such as PAM and BLOSUM). Indels are scored by defining \(sc('-', \cdot ) = sc(\cdot ,'-') = W_d\), where \(W_d\) is the weight of an indel, and \(sc('-','-') = 0\). The score for alignment \(\phi\) is computed as

$$\begin{aligned} SP(\phi ) = \sum _{j=1}^\ell S(j) \end{aligned}$$

Multiobjective multiple sequence alignment

Many problems of practical relevance are frequently characterized by different objectives that simultaneously have to be taken into account when it comes to solve the problem. In most cases, these objectives conflict with each other and optimizing the problem for a specific objective might compromise other objectives. An appropriate approach to a multiobjective problem is to obtain a set of solutions in such a way that each of which cannot be improved in one objective without deteriorating at least one of the others [19].

Traditionally, approaches for solving sequence alignment are performed with a single objective function, which is based on a weighted sum of matches, mismatches, insertions and deletions. However, there is no agreement on how to specify weights for these parameters. In a multiobjective formulation, the practitioner does not have to define weights and can get access to much more information [10].

Let \(\phi\) be an alignment of m sequences \((A_1,\ldots ,A_m)\), as defined in the previous section. We define the following two score functions

$$\begin{aligned} S_s(j) = \sum _{1\le i< k \le m} s(b_{i,j}, b_{k,j}) \quad \text{ and } \quad S_d(j) = \sum _{1\le i < k \le m} d(b_{i,j}, b_{k,j}) \end{aligned}$$

where the score \(s(b_{i,j}, b_{k,j})\), for \(b_{i,j}, b_{k,j} \in \Sigma\) is obtained from a substitution matrix and \(d(b_{i,j}, b_{k,j})\), for \(b_{i,j}, b_{k,j} \in \Sigma ^\prime\), is 1 if either \(b_{i,j}= {\text{'}}-{\text{'}}\) or \(b_{k,j}= {\text{'}}-{\text{'}}\), and 0, otherwise. The multiobjective score sum-of-pairs for alignment \(\phi\) is

$$\begin{aligned} BSP(\phi ) = \left( BSP_s(\phi ) = \sum _{j=1}^\ell S_s(j),BSP_d(\phi ) = \sum _{j=1}^\ell S_d(j)\right) \end{aligned}$$

Given two alignments \(\phi\) and \(\phi ^\prime\), \(BSP(\phi ) \ge BSP(\phi ^\prime )\) (\(\phi\) dominates \(\phi ^\prime\)) if and only if it holds that \(BSP_s(\phi ) \ge BSP_s(\phi ^\prime )\) and \(BSP_d(\phi ) \le BSP_d(\phi ^\prime )\), with at least one strict inequality. An alignment \(\phi ^*\) is Pareto optimal if there exists no other alignment \(\phi\) such that \(BSP(\phi ) \ge BSP(\phi ^*)\). The set of all Pareto optimal alignments is called Pareto optimal alignment set. The image of a Pareto optimal alignment in the score space is a non-dominated score and the set of all non-dominated scores is called non-dominated score set.

Although pairwise sequence alignment can be solved efficiently, that is, the running time to find the non-dominated score set is a polynomial function of the size of the sequences, this is no longer the case for the multiple counterpart for an arbitrary number of sequences. Thus, the goal, in practice, is to find an approximation to the non-dominated score set in a reasonable amount of time. In this article we explore the application of heuristic methods, in particular, local search algorithms.

Pareto local search

A local search algorithm starts from a feasible solution and searches locally for better neighbors to replace the current one. This neighborhood search is repeated until no improvement is found anymore and the algorithm stops in a local optimum. In the context of MMSA, the neighborhood function associates a set of feasible alignments \(N(\phi )\) to every feasible alignment \(\phi\). An alignment \(\phi\) is a Pareto local optimum if there exists no alignment \(\phi ^\prime\) in \(N(\phi )\) such that \(BSP(\phi ^\prime ) \ge BSP(\phi )\).

Pareto local search (PLS) [20] is a generic local search framework for multiobjective optimization problems that follows the principle above by keeping the best alignments into a special data structure, known as the archive. Each neighboring alignment that is non-dominated with respect to the alignments in the archive, is added to it. The algorithm “naturally” terminates once there is no neighbor of any alignment in the archive that is not dominated (a local optimal set). See [20, 21] for more details about this approach for multiobjective combinatorial optimization problems and its successful application to scheduling and graph problems.

In the following we describe some options that were considered to apply PLS to MMSA: neighborhood function and starting alignment.

Neighborhood

We consider a k-block neighborhood for this problem, which consists of exchanging a substring of at most k characters in \(\Sigma\) with indels in a gap, that is, a contiguous sequence of indels. In the following, we establish the conditions for two alignments to be k-block neighbors. Let \(A_1 = (a_{1,1}, \ldots , a_{1,n_1} )\), ..., \(A_m = (a_{m,1} ,\ldots , a_{m,n_m})\) denote m strings and let \(\phi _B\) and \(\phi _C\) be two alignments, where \(\phi _B = B_1,\ldots , B_m\) and \(\phi _C = C_1,\ldots , C_m\). The alignments \(\phi _B\) and \(\phi _C\) are k-block neighbors if and only if the following conditions hold:

  1. (i)

    \(|B_i |= |C_i |\), for \(i= 1,\ldots ,m\);

  2. (ii)

    Let \(J = \{ j \ | \ j \in \{1,\ldots ,m\} : B_j \ne C_j \}\); then \(|J |= 1\);

  3. (iii)

    For \(j \in J\) let \(\pi _{B_j}^i\) and \(\pi _{C_j}^i\) denote the position of \(a_{j,i} \in A_j\) in string \(B_j\) and \(C_j\), respectively. Let \(I_{B_j} = \{\pi _{B_j}^i \mid \pi _{B_j}^i \ne \pi _{C_j}^i, i=1,\ldots ,|A_j |\}\) and \(I_{C_j} = \{\pi _{C_j}^i \mid \pi _{B_j}^i \ne \pi _{C_j}^i, i=1,\ldots ,|A_j |\}\). Then, \(\ell = |I_{B_j} |= |I_{C_j}|\le k\).

  4. (iv)

    Let \(I_{B_j} = \{ I_{B_j,1},\ldots , I_{B_j,\ell }\}\) and \(I_{C_j} = \{ I_{C_j,1},\ldots , I_{C_j,\ell }\}\); then \(I_{B_j,i+1} - I_{B_j,i} = I_{C_j,i+1} - I_{C_j,i} = 1\), for \(i=1,\ldots , \ell -1\).

Conditions (i) and (ii) state that both alignments should have the same size, and differ only in the jth string, respectively. In addition, condition (iii) and (iv) state that at most k characters from string \(A_j\) do not occupy the same position in both alignments, and that those characters are contiguous. For illustration purpose, consider the following alignment:

figure a

We list all possible 2-block neighbors as follows:

figure b

It is possible to visit all k-block neighbors of an alignment \(\phi\) in a straightforward way. Given the first string of an alignment \(\phi\), consider the leftmost substring of size one that contains only a character and an indel to its right. Then, exchange it with every indel in its right and stop when no indel is found. In case the substring has also indels to its left, repeat the same exchange procedure. If it is still possible, increase the size of the substring by one and repeat the same moves to its right and to its left; repeat the overall procedure until reaching a substring of size k. Then, consider the next substring of size one to the right and repeat the same procedure as above. Note that each move generates a new k-block neighbor of alignment \(\phi\). In order to maintain feasibility, the columns that contain only indels are deleted from the alignment.

Starting alignment

The starting solution may have a strong impact on the overall performance of the local search. We considered the three following possibilities for PLS:

  1. (i)

    Rand: A random feasible alignment, which is obtained by inserting indels randomly into the strings, except in the largest one. This initialization option tends to generate a feasible alignment with a small number of indels.

  2. (ii)

    Clust: A feasible alignment obtained from program Clustal Omega [22] (available at http://www.clustal.org/omega).

  3. (iii)

    Tcoffee: A feasible alignment obtained from program Tcoffee [23] (available at http://www.tcoffee.org/Projects/tcoffee/).

We choose Tcoffee since it is one of the best consistency based methods that outperforms the others existing programs in terms of accuracy and Clustal Omega since it outperforms Tcoffee whenever sequences with large N/C terminal extensions exist [24]. We used the default parameters suggested for these two programs.

figure c

Pareto iterated local search

The local search framework described in the previous section may get trapped in a local optima set of poor quality. One possibility for escaping from local optima is to perturb one or more alignments in the archive and restart the local search from those alignments. Pareto iterated local search (PILS) implements this search behavior and its main steps are presented in Algorithm 1. Similar to the single objective counterpart (see [25]), four components have to be specified in PILS:

  1. (i)

    GenerateInitialSolution, which generates an initial set of alignments \(\Phi _0\); based on the experimental results, we consider two starting solutions previously described (Tcoffee and Clust).

  2. (ii)

    Perturbation, which modifies some of the current alignments in set \(\Phi\) leading to some intermediate set of alignments \(\Phi ^\prime\). Based on preliminary experiments, we decided to randomly choose 5 alignments from set \(\Phi\) in such a way that the corresponding non-dominated scores are spread enough in the objective space. Then, a gap of a given size is inserted into each sequence in random positions.

  3. (iii)

    PLS, which is applied to \(\Phi ^\prime\) and returns an improved set of alignments \(\Phi ^{\prime \prime }\).

  4. (iv)

    AcceptanceCriterion, which consists of merging the two sets, \(\Phi\) and \(\Phi ^{\prime \prime }\), and filtering out the dominated scores.

Discussion and results

We performed several experiments in order to understand the effect of different parameters of PLS and PILS and to establish functional relationships between algorithm performance and instance features, such as the number of sequences and their sizes. Furthermore, we compare the quality of the alignments produced by our approaches with those produced by TCoffee and Clustal Omega.

The implementations were coded in C and compiled with GCC version 4.6.1 with the −O3 compiler option. Experiments were performed in a cluster with 16 nodes, each one comprising an Intel Core i7 CPU with 4-core and 2 GB Ram, with operating system Ubuntu 11.10. Except the compiler option, no other code optimization technique was used in the experiments. For this experimental analysis, we have chosen sequences obtained from the benchmark database BAliBASE 3.0 [26]. In our substitution matrix, each cell value is 1 when the both residues are equal, and −1 otherwise.

BAliBASE 3.0 is a well-known benchmark for evaluating the multiple sequence alignment algorithms. The data set Reference 1 consists of equidistant family sequences with two subgroups: RV11 and RV12. RV11 contains 38 data sets with less than 20 % residue identity between groups and RV12 contains 44 data sets with residue identity between 20 and 40 %. Reference 2 (RV20) contains 41 alignments comprising family sequences with more than 40 % similarity and a highly divergent orphan sequence. Reference 3, with 30 data sets, contains subfamilies such that the sequences within a given subfamily share more than 40 % identity, but any two sequences from different subfamilies share less than 20 % identity. The reference set RV40, with 49 data sets, contains sequences that are composed of groups with N/C-terminal extensions. The reference set RV50 contains 16 data sets with large internal insertions. We tested our approaches on all the instances from the sets RV11 and RV20, composed by sequences with different sizes and varying percentage identities.

Since the heuristics proposed in this paper are stochastic, we ran each variant 10 times for each instance and recorded the contents of the archive for each run, namely the approximate set, once the termination criterion was met. To evaluate the quality of each approximate set, we computed its hypervolume indicator value. This indicator measures the area that is dominated by the approximate set, bounded by a reference point [27]; see an in-depth discussion about the hypervolume indicator in relation with other performance assessment methods in Zitzler et al. [28]. Moreover, it is known that the hypervolume is maximized when the optimal set is found [28].

Figure 1 illustrates the hypervolume indicator (shaded area) for a given approximate set (black points) and a given reference point (white point). We have chosen, as the reference point for each instance, the minimum substitution score minus one and the maximum number of indels plus one that were obtained from the runs of all variants. We merged all approximate sets produced from all runs for a given instance, extracted the non-dominated scores and computed the hypervolume indicator value of the resulting set, namely, the reference hypervolume indicator value, which is then used as a reference value to evaluate the relative performance of each approach. Then, for each approximate set, we computed its hypervolume indicator value and the percentage of this value with respect to the reference hypervolume indicator value; the larger the value, the better is the approximate set in terms of this indicator.

Fig. 1
figure 1

Illustration of the hypervolume indicator [17]

We analysed the effect of different k-values in the k-block neighborhood, as well as the effect of the three starting alignments on the performance of PLS. For a given instance, let Min denote the length of the smallest string; we consider \(k \in \{\mathtt{Min},\mathtt{Min}/2,\mathtt{Min}/4,\mathtt{Min}/8,\mathtt{Min}/16\}\). PLS terminates once it is not possible to find non-dominated neighboring alignments or the time limit of 5 minutes of CPU-time is reached. Tables 1 and 2 report the results obtained for PLS for the benchmark sets RV11 and RV20. The results were averaged over 10 runs, for each of the five k values and the three starting alignments. Column id corresponds to the instance id from the two benchmark sets (BB1100*.tfa and BB200*.tfa, where * denotes the id); column m gives the number of sequences; columns Min and Max correspond to the length of the smallest and the largest sequence, respectively. The instances are lexicographically ordered in terms of the number of sequences and the length of the smallest sequence. The values in italics correspond to the best result obtained for each instance. The last column Init shows the hypervolume indicator obtained for the non-dominated score of the two starting alignments obtained with Clustal Omega and Tcoffee.

Table 1 Percentage with respect to the reference hypervolume indicator value for local search with different k-block neighborhood sizes and three starting alignments (Clustal Omega, Tcoffee and Rand) for each benchmark instance in dataset RV11. The results are averaged over 10 runs. See text for more details
Table 2 Percentage with respect to the reference hypervolume indicator value for local search for different k-block neighborhood sizes and starting solutions (Clustal Omega, Tcoffee and Rand) for each benchmark instancein dataset RV20. The results are averaged over 10 runs. See text for more details

The results clearly indicate the best performance of PLS when starting with alignments Clust and Tcoffee, instead of a feasible random alignment. Also, the best k-block value strongly depends on the number of sequences and, to a smaller extent, to their sizes. As a rule, for the given cut-off time, large (small) k values achieve better performance on problems with a smaller (larger) number of sequences. Moreover, column Init indicates that PLS strongly improves upon the two starting alignments.

Tables 3 and 4 report results obtained for PILS on the same benchmark sets, averaged over 10 runs, for k-block size equals to 2; we recall that we use Clust and Tcoffee as starting alignments. The results reported in the tables do not include the CPU-time taken to compute the starting alignments. PILS always terminates once the time limit of 5 minutes of CPU-time is reached. In this case, the limit time for each PLS run within PILS is \(5/(P+1)\) minutes. Column \(\mathtt{PLS}^{max}\) gives the best value of PLS from Tables 1 and 2. The best results for each sequence set are shown in italics. In most of the instances, PILS improves over PLS, although the best performance may depend on the number of sequences and their size: a small (large) number of perturbations gives better performance on larger (smaller) sequences and larger (smaller) number of sequences.

Table 3 Percentage with respect to the reference hypervolume indicator value for PILS with several perturbation number (P) for each instances of data set RV11 [17]. The results are averaged over 10 runs. See text for more details
Table 4 Percentage with respect to the reference hypervolume indicator value for PILS with several perturbation number (P) for each instances of data set RV20. The results are averaged over 10 runs. See text for more details

We complemented the analysis to gain insight into the absolute quality of the alignments produced by PILS, when compared to the reference alignments available in the BAliBASE benchmark. We rely on the correctly aligned pairs (SP) and columns correctly aligned (TC) measures, as performed in [15]. These ratios can be computed by the program BAliScore which is available by ftp from ftp-igbmc.u-strasbg.fr/pub/BAliBASE [29]. We used seven different data sets from BAliBASE with specific features chosen from this benchmark; see Table 5. Columns Reference and id correspond to the reference and id number of the data set in the BAliBASE benchmark; columns m, Min and Max correspond to the number of sequences, length of the smallest and largest sequence, respectively; Len corresponds to the size of reference alignment in the BAliBASE benchmark.

Table 5 The selective data sets from difference reference of the benchmark

We ran PILS 10 times on these data sets, and for each collection of runs, we chose the best alignment produced in terms of \(BSP_s\) score. We computed the SP and TC ratios by using this alignment and the reference alignment available for each chosen data set in BAliBASE. For calculation of the SP ratio, we used the substitution matrix PAM 250. Moreover, for comparison purpose, we performed the same procedure for the alignments produced by Clustal Omega and Tcoffee.

Tables 6 and 7 show the results obtained with SP and TC ratios, respectively. The italics face values represent the best ratio. From Table 6, it can be observed that PILS obtained the best value for all the tested data sets. In Table 7, the best TC ratio is either from Tcoffee or Clustal Omega. Note that a null TC value was obtained by the alignment of Clustal Omega in RV12 id 20 and RV20 id 20, and by TCoffee in set from RV20 and id 1.

Table 6 The results of Sp score in Tcoffee, Clustal Omega and PILS on the selected test cases
Table 7 The results of TC score in Tcoffee, Clustal Omega and PILS on the selected test cases

Conclusions

The local search introduced in this article is able to provide high quality alignments in a reasonable amount of time. Since local search can be stuck in local optima, we propose a method that perturbs the set of alignments in the archive and restarts the local search, such as done by iterated local search. The results show that the hypervolume indicator has improved over different number of perturbations.

The size of the trade-off returned by our approaches is still considerably large, which may be overwhelming for the practitioner. One possibility is to present a smaller subset of alignments that are representative of the complete trade-off; such notion of representativeness may be based on metrics of uniformity (the most spread subset) or coverage (the subset that best covers the complete set). Algorithms that allow to find optimal representative subsets are available in [30].

For the future work, we might try other starting alignments that are not just based on the similarity but also taking into account other biological criteria such as the secondary structure of the alignments. Further, we may test different perturbation methods. One of the possibilities is to use the phylogenetic trees of the sequences and apply the ratchet strategy [31, 32] which may help to create a new starting solution with more information from existing results.

References

  1. Pei J. Multiple protein sequence alignment. Curr Opin Struct Biol. 2008;18(3):382–6.

    Article  Google Scholar 

  2. Gelly JC, Joseph AP, Srinivasan N, De Brevern AG. iPBA: a tool for protein structure comparison using sequence alignment strategies. Nucleic Acids Res. 2011;39(suppl 2):18–23.

    Article  Google Scholar 

  3. Wang LS, Leebens Mack J, Wall PK, Beckmann K, de Pamphilis CW, Warnow T. The impact of multiple protein sequence alignment on phylogenetic estimation. IEEE/ACM Trans Comput Biol Bioinform. 2011;8(4):1108–19.

    Article  Google Scholar 

  4. Setlow JK. Genetic engineering: principles and methods, vol. 28. New York: Springer; 2007.

    Book  Google Scholar 

  5. Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970;48(3):443–53.

    Article  Google Scholar 

  6. Morrison DA. Is sequence alignment an art or a science? Syst Bot. 2015;40(1):14–26.

    Article  Google Scholar 

  7. Abbasi M, Paquete L, Liefooghe A, Pinheiro M, Matias P. Improvements on bicriteria pairwise sequence alignment: algorithms and applications. Bioinformatics. 2013;29(8):996–1003.

    Article  Google Scholar 

  8. DeRonne KW, Karypis G. Pareto optimal pairwise sequence alignment. IEEE/ACM Trans Comput Biol Bioinform. 2013;10(2):481–93.

    Article  Google Scholar 

  9. Paquete L, Matias P, Abbasi M, Pinheiro M. Mosal: software tools for multiobjective sequence alignment. Source Code Biol Med. 2014;9(1):2.

    Article  Google Scholar 

  10. Roytberg MA, Semionenkov MN, Tabolina OI. Pareto-optimal alignment of biological sequences. Biophysics. 1999;44(4):565–77.

    Google Scholar 

  11. Taneda A. Multi-objective pairwise RNA sequence alignment. Bioinformatics. 2010;26(19):2383–90.

    Article  Google Scholar 

  12. Schnattinger T, Schöning U, Kestler H. Structural RNA alignment by multi-objective optimization. Bioinformatics. 2013;29(13):1607–13.

    Article  Google Scholar 

  13. Handl J, Kell DB, Knowles J. Multiobjective optimization in bioinformatics and computational biology. IEEE/ACM Trans Comput Biol Bioinform. 2007;4(2):279–92.

    Article  Google Scholar 

  14. Ortuño FM, Florido JP, Urquiza JM, Pomares H, Prieto A, Rojas I. Optimization of multiple sequence alignment methodologies using a multiobjective evolutionary algorithm based on nsga-ii. In: IEEE Congress on evolutionary computation (CEC). New York: IEEE; 2012. pp. 1–8.

  15. Ortuño FM, Valenzuela O, Rojas F, Pomares H, Florido JP, Urquiza JM, Rojas I. Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns. Bioinformatics. 2013;29(17):2112–21.

    Article  Google Scholar 

  16. Soto W, Becerra D. A multi-objective evolutionary algorithm for improving multiple sequence alignments. In: Advances in bioinformatics and computational biology—Proceedings of the 9th Brazilian symposium on bioinformatics, BSB 2014, Belo Horizonte, Brazil, October 28–30; 2014. pp. 73–82.

  17. Abbasi M, Paquete L, Pereira FB. Local search for multiobjective multiple sequence alignment. In: Ortuño F, Rojas I, editors. Bioinformatics and biomedical engineering: Proceedings of the third international conference, IWBBIO 2015, Granada, Spain, April 15–17, part II. Cham: Springer; 2015. p. 175–82.

    Google Scholar 

  18. Bockenhauer HJ, Bongartz D. Algorithmic aspects of bioinformatics. Berlin: Springer; 2007.

    MATH  Google Scholar 

  19. Ehrgott M. Multicriteria optimization. Berlin: Springer; 2005.

    MATH  Google Scholar 

  20. Paquete L, Schiavinotto T, Stützle T. On local optima in multiobjective combinatorial optimization problems. Ann Oper Res. 2007;156(1):83–97.

    Article  MathSciNet  MATH  Google Scholar 

  21. Liefooghe A, Humeau J, Mesmoudi S, Jourdan L, Talbi E. On dominance-based multiobjective local search: design, implementation and experimental analysis on scheduling and traveling salesman problems. J Heuristics. 2012;18(2):317–52.

    Article  Google Scholar 

  22. Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Soding J, Thompson JD, Higgins DG. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol. 2011;7:539.

    Article  Google Scholar 

  23. Notredame C, Higgins DG, Heringa J. T-coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000;302(1):205–17.

    Article  Google Scholar 

  24. Pais FS, de Ruy P, Oliveira G, Coimbra R. Assessing the efficiency of multiple sequence alignment programs. Algorithms Mol Biol. 2014;9(1):4.

    Article  Google Scholar 

  25. Lourenço HR, Martin OC, Stützle T. Iterated local search: handbook of metaheuristics. International series in operations research and management science, vol. 57. Berlin: Kluwer Academic Publishers; 2013. p. 321–53.

    Google Scholar 

  26. Thompson JD, Koehl P, Ripp R, Poch O, Thompson JD, Koehl P, Ripp R, Poch O. BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins. 2005;61(1):127–36.

    Article  Google Scholar 

  27. Zitzler E, Thiele L. Multiobjective optimization using evolutionary algorithms—a comparative case study. Proceedings of the 5th international conference on parallel problem solving from nature, PPSN. Berlin: Springer; 1998. p. 292–304.

  28. Zitzler E, Thiele L, Laumanns M, Fonseca CM, da Fonseca V. Performance assessment of multiobjective optimizers: an analysis and review. IEEE Trans Evol Comput. 2003;7(2):117–32.

    Article  Google Scholar 

  29. Thompson JD, Plewniak F, Poch O. A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res. 1999;27(13):2682–90.

    Article  Google Scholar 

  30. Vaz D, Paquete L, Fonseca CM, Klamroth K, Stiglmayr M. Representation of the non-dominated set in biobjective discrete optimization. Comput Oper Res. 2015;63:172–86.

    Article  MathSciNet  Google Scholar 

  31. Nixon KC. The parsimony ratchet, a new method for rapid parsimony analysis. Cladistics. 1999;15(4):407–14.

    Article  Google Scholar 

  32. Morrison DA. Increasing the efficiency of searches for the maximum likelihood tree in a phylogenetic analysis of up to 150 nucleotide sequences. Syst Biol. 2007;56(6):988–1010.

    Article  Google Scholar 

Download references

Authors’ contributions

MA implemented the algorithms, did the experimental analysis and drafted the manuscript. LP and FBP participated in the discussion, provided suggestions and comments. All authors read and approved the final manuscript.

Acknowledgements

The authors acknowledge one of the anonymous reviewers for suggestions of future work. M. Abbasi thanks the Fundação para a Ciência e Tecnologia (FCT), Portugal, for the Ph.D. Grant SFRH/BD/91451/2012. This work was financed by national funding via FCT and by the European Regional Development Fund (FEDER), through the COMPETE 2020 Operational Program for Competitiveness and Internationalization (POCI). The publication costs were funded by the University of Coimbra.

Competing interests

The authors declare that they have no competing interests.

Declarations

The publication costs for this article were funded by the University of Coimbra.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luís Paquete.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abbasi, M., Paquete, L. & Pereira, F.B. Heuristics for multiobjective multiple sequence alignment. BioMed Eng OnLine 15 (Suppl 1), 70 (2016). https://doi.org/10.1186/s12938-016-0184-z

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/s12938-016-0184-z

Keywords