Re: OSD read latency grows over time

Roman Pashin <romanpashin28@xxxxxxxxx> · Mon, 22 Jan 2024 13:22:39 +0300

>
> Hi Mark, thank you for prompt answer.

The fact that changing the pg_num for the index pool drops the latency
> back down might be a clue.  Do you have a lot of deletes happening on
> this cluster?  If you have a lot of deletes and long pauses between
> writes, you could be accumulating tombstones that you have to keep
> iterating over during bucket listing.

What you describe looks very close to our case of periodic creation of
checkpoints. Now it sounds like it can be our issue.

Those get cleaned up during
> compaction.  If there are no writes, you might not be compacting the
> tombstones away enough.  Just a theory, but when you rearrange the PG
> counts, Ceph does a bunch of writes to move the data around, triggering
> compaction, and deleting the tombstones.
>
> In v17.2.7 we enabled a feature that automatically performs a compaction
> if too many tombstones are present during iteration in RocksDB.  It
> might be worth upgrading to see if it helps (you might have to try
> tweaking the settings if the defaults aren't helping enough).  The PR is
> here:
>
> https://github.com/ceph/ceph/pull/50893
>
> Thank you very much for this idea! We'll upgrade cluster to v17.2.7 and
will check if it helped. If not - we'll try to tune options you are
referring to. Anyway I'll update the thread with result.

Thank you once again for well-explained suggestion, Mark!

--
Thank you,
Roman
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx