Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store: disk usage continually increasing over time #7473

Open
pgier opened this issue Jun 20, 2024 · 4 comments
Open

store: disk usage continually increasing over time #7473

pgier opened this issue Jun 20, 2024 · 4 comments

Comments

@pgier
Copy link
Contributor

pgier commented Jun 20, 2024

Thanos, Prometheus and Golang version used:
Thanos version: quay.io/thanos/thanos:v0.35.1
Golang: whatever version was used to build the quay image
Prometheus: quay.io/prometheus/prometheus:v2.47.0

Object Storage Provider: AWS S3

What happened: Disk usage (k8s PV) continually increases, it's up to about 210 GB currently.

What you expected to happen: Disk usage would stabilize at a certain point

How to reproduce it (as minimally and precisely as possible): I'm not sure how to reproduce it other than install thanos with a store gateway and some metrics sources, and then periodically run queries.

Full logs to relevant components:

See attached log file.

Anything else we need to know:

thanos-store.log

@pgier
Copy link
Contributor Author

pgier commented Jun 20, 2024

Possibly related to #7029

@mdraijer
Copy link

Same here.

We have 9 different Thanos stacks running, with each a storegateway. Most of them have limited disk usage, as you would expect from the phrase "It acts primarily as an API gateway and therefore does not need significant amounts of local disk space", and also "It keeps a small amount of information about all remote blocks on local disk".

However, one of them has constantly increased in used disk space. Also after cleaning up and restarting, it starts filling up again. Have increased the disk to 150Gi now, whereas all other storegateway disks are 20Gi in size and 5%-30% filled.

Recently one of the other storegateway also started to fill up.

Why do some stores have so much local data and others not?

@harry671003
Copy link
Contributor

Each store-gateway downloads the index-header for the blocks its responsible for.

IIRC, Thanos by default doesn't have sharding enabled. Maybe you could try to shard the store-gateways so that not every store-gateway will download all the block index-headers.

I think there are two ways to shard store-gateways:

@pgier
Copy link
Contributor Author

pgier commented Jun 27, 2024

@harry671003 I would think that would cause disk usage at startup, or when the overall number of metrics increases, but I'm seeing a gradual and consistent increase in disk usage, maybe 1-2 GB per week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants