Pinterest Engineering

Improving Efficiency Of Goku Time Series Database at Pinterest (Part — 3)

thumbnail

Table of Contents

  1. Metrics Namespace
  2. Cost Savings
  3. Architecture Changes/Fixes For Cost Reduction

1. Metrics Namespace

  • Goku had a fixed set of properties for the metrics stored, such as in memory storage for recent data and disk storage for older data.
  • Metrics can belong to multiple namespaces, each with different storage configurations.
  • Namespace configurations are stored in a dynamic shared config file watched by all hosts in the Goku ecosystem.
  • Data points are forwarded to Kafka topics based on the namespace configuration.
  • The Observability team used these features to reduce the number of time series stored in GokuS by 37%.
  • The reduction in stored data resulted in disk usage reduction on GokuL hosts.

2. Cost Savings

  • The Observability team analyzed metrics data and determined that a subset of older data did not need to be stored at host granularity.
  • With the help of features provided by Goku, the team reduced the number of time series stored in GokuS from 16B to ~10B.
  • This reduction in stored data led to cost savings for the client team.

3. Architecture Changes/Fixes For Cost Reduction

  • Improvements and changes were made in the Goku architecture, including design and code improvements in GokuS, Goku Compactor, and Goku Ingestor.
  • Process memory analysis in GokuS revealed that metric name strings were consuming a large amount of host memory.
  • Tracking the cumulative size of metric name strings showed that ~12 GB per host and ~8TB per Goku cluster in host memory was consumed by these strings.
  • Cluster machine hardware evaluation was conducted for GokuS, Goku Compactor, and GokuL to optimize resource usage and reduce costs.