How We Export Billion-Scale Graphs on Transactional Graph Databases

How We Export Billion-Scale Graphs on Transactional Graph Databases
-
The article discusses the development of a scalable graph export processing pipeline for large-scale graphs in transactional graph databases.
-
A NuGraph analytics plugin is built over the open-source graph database JanusGraph to perform parallel scans and export the graph to HDFS.
-
Techniques such as offline graph export separation, handling super nodes, and JVM memory management are applied to improve the database's performance.
-
FoundationDB, a distributed transactional key/value store, supports each graph query as a transaction for NuGraph.
Challenges in Exporting Large-scale Graphs
-
Graph analytics that require a full scan of the graph cannot be directly supported by transactional query execution in large graphs.
-
Offline graph export needs to be separated from online transactional query traffic.
-
Big-data processing platforms like Spark and HDFS can be used for graph analytics queries on the exported graph stored in HDFS.
The Solution
- The Disaster/Recovery (DR) cluster is set up to synchronize data from the primary database cluster for graph export while not interfering with online query traffic.
- FoundationDB's client programming APIs do not support direct full scans, so parallel scans are implemented using Spark as the scan engine.
- The NuGraph analytics plugin manages the partition mapping between the backend store and Spark for efficient graph export.