Ebay Tech BlogNovember 15, 2023

How We Export Billion-Scale Graphs on Transactional Graph Databases

The article discusses the development of a scalable graph export processing pipeline for large-scale graphs in transactional graph databases.
A NuGraph analytics plugin is built over the open-source graph database JanusGraph to perform parallel scans and export the graph to HDFS.
Techniques such as offline graph export separation, handling super nodes, and JVM memory management are applied to improve the database's performance.
FoundationDB, a distributed transactional key/value store, supports each graph query as a transaction for NuGraph.

Graph analytics that require a full scan of the graph cannot be directly supported by transactional query execution in large graphs.
Offline graph export needs to be separated from online transactional query traffic.
Big-data processing platforms like Spark and HDFS can be used for graph analytics queries on the exported graph stored in HDFS.

The Disaster/Recovery (DR) cluster is set up to synchronize data from the primary database cluster for graph export while not interfering with online query traffic.
FoundationDB's client programming APIs do not support direct full scans, so parallel scans are implemented using Spark as the scan engine.
The NuGraph analytics plugin manages the partition mapping between the backend store and Spark for efficient graph export.