Ebay Tech Blog

How We Export Billion-Scale Graphs on Transactional Graph Databases

thumbnail

How We Export Billion-Scale Graphs on Transactional Graph Databases

  • The article discusses the development of a scalable graph export processing pipeline for large-scale graphs in transactional graph databases.

  • A NuGraph analytics plugin is built over the open-source graph database JanusGraph to perform parallel scans and export the graph to HDFS.

  • Techniques such as offline graph export separation, handling super nodes, and JVM memory management are applied to improve the database's performance.

  • FoundationDB, a distributed transactional key/value store, supports each graph query as a transaction for NuGraph.

Challenges in Exporting Large-scale Graphs

  • Graph analytics that require a full scan of the graph cannot be directly supported by transactional query execution in large graphs.

  • Offline graph export needs to be separated from online transactional query traffic.

  • Big-data processing platforms like Spark and HDFS can be used for graph analytics queries on the exported graph stored in HDFS.

The Solution

  • The Disaster/Recovery (DR) cluster is set up to synchronize data from the primary database cluster for graph export while not interfering with online query traffic.
  • FoundationDB's client programming APIs do not support direct full scans, so parallel scans are implemented using Spark as the scan engine.
  • The NuGraph analytics plugin manages the partition mapping between the backend store and Spark for efficient graph export.