Project funded by FCT (PTDC/CCI-BIO/29676/2017), from 01/10/2018 to 30/09/2021.


The current ability to rapidly sequence whole microbial genomes is revolutionizing microbiology and epidemiological surveillance, with high impact on the identification of antimicrobial resistance genes and virulence factors which can have direct clinical application in the treatment of patients, or on the detection of outbreaks in hospital settings or in the food industry, e.g., by monitoring the spread of antimicrobial resistance, an ever growing concern. It has allowed also more complex phylogenetic analyses based on the whole genome data. The bottleneck has however shifted to data analysis problems. From a computational point of view, a growing concern is how algorithms and tools can be scaled up to analyse thousands of genetic loci in thousands of isolates. This project aims then to: (1) research and design efficient and scalable data structures and algorithms that allow phylogenetic analyses at large scales; (2) develop tools suitable for processing large scale phylogenetic analysis, deployable in cloud and HPC environments; (3) make tools available as reusable components, enabling the construction of more complex parametrizable pipeline workflows; (4) develop and integrate intuitive and user-friendly interfaces. The research team reflects the multidisciplinary character of the project. The team gathers researchers from two national research institutes, INESC-ID and IMM. Prior collaborations of this team in bioinformatics and computational biology focused on typing data analysis, phylogenetic inference and the development of software tools for these tasks.

Main topics

  1. Efficient and scalable algorithms for phylogenetic data analysis and processing.
  2. Population surveillance and outbreak detection.
  3. Modular platform for phylogenetic data integration, processing and visualization.


  1. Order-Preserving Pattern Matching Indeterminate Strings, by D. M. Costa, L. Russo, R. Henriques, H. Bannai and A. P. Francisco (preprint).
  2. An analysis of the graph processing landscape, by M. E. Coimbra, A. P. Francisco and L. Veiga (preprint).
  3. Community Finding with Applications on Phylogentic Networks, by L. Rita (MSc thesis).
  4. Large scale phylogenetic inference from noisy data based on minimum weight spanning arborescences, by J. Espada (MSc thesis).
  5. Cache-Oblivious Nested Loops Based on Hilbert Curves, by J. F. Alves (MSc thesis).
  6. Incremental hypervolume calculation in d dimensions, by K. Yefimenko (MSc thesis).
  7. Simulation based approach to bacterial evolution, by M. Oliveira e Costa (MSc thesis).
  8. On dynamic succinct graph representations, by M. E. Coimbra, A. P. Francisco, L. M. S. Russo, G. De Bernardo, S. Ladra and G. Navarro (DCC'2020).
  9. Approximating Optimal Bidirectional Macro Schemes, by L. M. S. Russo, A. S. D. Correia, G. Navarro and A. P. Francisco (DCC'2020).
  10. Incremental Multiple Longest Common Sub-Sequences, by L. M. S. Russo, A. P. Francisco and T. Rocher (preprint).
  11. Hardness of Modern Games, by D. M. Costa, A. P. Francisco and L. M. S. Russo (preprint).
  12. Distance-based phylogenetic inference from typing data: a unifying view, by C. Vaz, M. Nascimento, J. A. Carriço, T. Rocher and A. P. Francisco (Briefings in Bioinformatics).
  13. Small Longest Tandem Scattered Subsequences, L. M. S. Russo and A. P. Francisco (preprint).
  14. A graph algorithm library based on compact data structures, by J. M. Hrotko (MSc thesis).
  15. Library of efficient algorithms for phylogenic analysis, by L. B. Silva (MSc thesis).
  16. A framework for large scale phylogenetic analysis, by B. Lourenço (MSc thesis).
  17. Range Minimum Queries in Minimal Space, by L. M. S. Russo (preprint).

The mailing lists

General discussion concerning this project.

To subscribe to a given list, just send mail to to know how to do it (where mailing-list-name is the name of your preferred list).