Graph Processing: Distributed Frameworks, Approximations and Compact Data Structures by Miguel Carvalho Valente Esaguy Coimbra PhD thesis presentation and discussion. Date: 2021-Sep-17 Time: 14:30 Room: Zoom Abstract: This work was pursued in context of big data, dynamic graphs and the different architectures and approaches used to process and store graphs. Herein an overview is provided of the state-of-the-art of the graph processing landscape as result of a survey on the techniques and systems to computationally represent graphs, different architectures of systems that process them and graph databases. These aspects of the landscape are interconnected and relate to questions that emerge as different needs for graph processing take form. Given the increasing dimension of data sets represented as graphs in the advent of big data, how far can one relax from exact to approximate computing? And to what extent can performance be gained with a trade-off in accuracy in distributed system graph-processing scenarios? Employing distributed systems, how may one scale-out methods of graph clustering (as an example of a relevant graph algorithm) to process large graphs while harnessing the computational power of distributed computing infrastructures? How easy is it to adopt incremental solutions for the processing of dynamic graphs? Could existing graph processing results be updated based solely on previous results and a given change, or set of changes, in the graph? These questions are approached with the VEILGRAPH contribution, a study and system for approximate graph processing using a summary structure applied to the case of stream processing. With it, different approximation parameters in a cloud computational environment are evaluated. As part of this work, the focus was also cast on the potential to improve distributed systems (community network micro-clouds) by defining computational groups and leader election as a graph processing problem. Results inherent to this were obtained with the GELLY-SCHEDULING contribution, where the properties of community network micro-clouds are studied. A relevant aspect underlying these questions is the computational representation of graphs. In what way may compact graph representations be used to manipulate big dynamic graphs? The answer to this question is aligned with the others, as the choice of graph representation has direct influence on performance. This aspect is explored with a dynamic implementation of the $k^2$-tree data structure to represent graphs, developed throughout the research effort. It is a dynamic compact graph data structure, of which a comparative performance study for different implementations is presented. Overall, graph processing has become an intersection of multiple perspectives, all interconnected. This thesis highlights these tendencies and their relations through the developed research and contributions.