Re: Ceph 16.2.x mon compactions, disk writes

Zakhar Kirpichenko <zakhar@xxxxxxxxx> · Fri, 13 Oct 2023 20:51:10 +0300

> Some of it is transferable to RocksDB on mons nonetheless.

Please point me to relevant Ceph documentation, i.e. a description of how
various Ceph monitor and RocksDB tunables affect the operations of
monitors, I'll gladly look into it.

> Please point me to such recommendations, if they're on docs.ceph.com I'll
get them updated.

This are the recommendations we used when we built our Pacific cluster:
https://docs.ceph.com/en/pacific/start/hardware-recommendations/

Our drives are 4x times larger than recommended by this guide. The drives
are rated for < 0.5 DWPD, which is more than sufficient for boot drives and
storage of rarely modified files. It is not documented or suggested
anywhere that monitor processes write several hundred gigabytes of data per
day, exceeding the amount of data written by OSDs. Which is why I am not
convinced that what we're observing is expected behavior, but it's not easy
to get a definitive answer from the Ceph community.

/Z

On Fri, 13 Oct 2023 at 20:35, Anthony D'Atri <anthony.datri@xxxxxxxxx>
wrote:

> Some of it is transferable to RocksDB on mons nonetheless.
>
> but their specs exceed Ceph hardware recommendations by a good margin
>
>
> Please point me to such recommendations, if they're on docs.ceph.com I'll
> get them updated.
>
> On Oct 13, 2023, at 13:34, Zakhar Kirpichenko <zakhar@xxxxxxxxx> wrote:
>
> Thank you, Anthony. As I explained to you earlier, the article you had
> sent is about RocksDB tuning for Bluestore OSDs, while the issue at hand is
> not with OSDs but rather monitors and their RocksDB store. Indeed, the
> drives are not enterprise-grade, but their specs exceed Ceph hardware
> recommendations by a good margin, they're being used as boot drives only
> and aren't supposed to be written to continuously at high rates - which is
> what unfortunately is happening. I am trying to determine why it is
> happening and how the issue can be alleviated or resolved, unfortunately
> monitor RocksDB usage and tunables appear to be not documented at all.
>
> /Z
>
> On Fri, 13 Oct 2023 at 20:11, Anthony D'Atri <anthony.datri@xxxxxxxxx>
> wrote:
>
>> cf. Mark's article I sent you re RocksDB tuning.  I suspect that with
>> Reef you would experience fewer writes.  Universal compaction might also
>> help, but in the end this SSD is a client SKU and really not suited for
>> enterprise use.  If you had the 1TB SKU you'd get much longer life, or you
>> could change the overprovisioning on the ones you have.
>>
>> On Oct 13, 2023, at 12:30, Zakhar Kirpichenko <zakhar@xxxxxxxxx> wrote:
>>
>> I would very much appreciate it if someone with a better understanding of
>> monitor internals and use of RocksDB could please chip in.
>>
>>
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx