A framework for large scale phylogenetic analysis by Bruno Miguel Leandro Lourenço MSc thesis presentation and discussion. Date: 2021-Jan-22 Time: 16:30 Room: Zoom Abstract: With growing exchanges of people and merchandise between countries, epidemics have become an issue of increasing importance and huge amounts of data are being collected every day. Hence, analyses that were usually run in personal computers are no longer feasible. It is now common to run such tasks in High-performance computing environments and/or dedicated systems. On the other hand, we are often dealing in these analyses with graphs and trees, and running algorithms to find patterns in such structures. Hence, although graph oriented databases and processing systems can be of much help in this setting, as far as we know there is no solution relying on these technologies to address large scale phylogenetic analysis challenges. This work aims to develop a modular framework that exploits such technologies, namely Neo4j. We address this challenge by proposing and developing a framework which allows representing large phylogenetic networks and trees, as well as ancillary data, that supports queries on such data, and allows the deployment of algorithms for inferring/detecting patterns and pre-computing visualizations, as a Neo4j plugin. This framework is innovative and brings several advantages to the phylogenetic analysis process, like the management of the phylogenetic trees, which will avoid having to compute them again, and the use of multilayer networks, that will make the comparison between them more efficient and scalable. The experimental evaluation results showcase that it can be very efficient in the mostly used operations and that the supported algorithms comply with their time complexity.