Towards OLAP in Graph Databases
2013 Oct 09Direto do Another Word for It: Towards OLAP in Graph Databases (MSc. Thesis) by Michal Bachman. Abstract:
Graph databases are becoming increasingly popular as an alternative to relational databases for managing complex, densely-connected, semi-structured data. Whilst primarily optimised for online transactional processing, graph databases would greatly benefit from online analytical processing capabilities. Since relational databases were introduced over four decades ago, they have acquired online analytical processing facilities; this is not the case with graph databases, which have only drawn mainstream attention in the past few years. In this project, we study the problem of online analytical processing in graph databases that use the property graph data model, which is a graph with properties attached to both vertices and edges. We use vertex degree analysis as a simple example problem, create a formal definition of vertex degree in a property graph, and develop a theoretical vertex degree cache with constant space and read time complexity, enabled by a cache compaction operation and a property change frequency heuristic. We then apply the theory to Neo4j, an open-source property graph database, by developing a Relationship Count Module, which implements the theoretical vertex degree caching. We also design and implement a framework, called GraphAware, which provides supporting functionality for the module and serves as a platform for additional development, particularly of modules that store and maintain graph metadata. Finally, we show that for certain use cases, for example those in which vertices have relatively high degrees and edges are created in separate transactions, vertex degree analysis can be performed several orders of magnitude faster, whilst sacrificing less than 20% of the write throughput, when using GraphAware Framework with the Relationship Count Module. By demonstrating the extent of possible performance improvements, exposing the true complexity of a seemingly simple problem, and providing a starting point for future analysis and module development, we take an important step towards online analytical processing in graph databases.
The MSc. thesis: GraphAware: Towards Online Analytical Processing in Graph Databases. Framework at Github: GraphAware Neo4j Framework. Michal laments:
It’s not an easy, cover-to-cover read, but there might be some interesting parts, even if you don’t go through all the (over 100) pages.
It’s one hundred and forty-nine pages according to my PDF viewer. I don’t think Michal needs to worry. If anyone thinks it is too long to read, it’s their loss. Definitely going on my short list of things to read in detail sooner rather than later.