Improving Efficiency Of Goku Time Series Database at Pinterest (Part — 3)

Table of Contents
- Metrics Namespace
- Cost Savings
- Architecture Changes/Fixes For Cost Reduction
1. Metrics Namespace
- Goku had a fixed set of properties for the metrics stored, such as in memory storage for recent data and disk storage for older data.
- Metrics can belong to multiple namespaces, each with different storage configurations.
- Namespace configurations are stored in a dynamic shared config file watched by all hosts in the Goku ecosystem.
- Data points are forwarded to Kafka topics based on the namespace configuration.
- The Observability team used these features to reduce the number of time series stored in GokuS by 37%.
- The reduction in stored data resulted in disk usage reduction on GokuL hosts.
2. Cost Savings
- The Observability team analyzed metrics data and determined that a subset of older data did not need to be stored at host granularity.
- With the help of features provided by Goku, the team reduced the number of time series stored in GokuS from 16B to ~10B.
- This reduction in stored data led to cost savings for the client team.
3. Architecture Changes/Fixes For Cost Reduction
- Improvements and changes were made in the Goku architecture, including design and code improvements in GokuS, Goku Compactor, and Goku Ingestor.
- Process memory analysis in GokuS revealed that metric name strings were consuming a large amount of host memory.
- Tracking the cumulative size of metric name strings showed that ~12 GB per host and ~8TB per Goku cluster in host memory was consumed by these strings.
- Cluster machine hardware evaluation was conducted for GokuS, Goku Compactor, and GokuL to optimize resource usage and reduce costs.