Re: Ceph 16.2.x mon compactions, disk writes

Zakhar Kirpichenko <zakhar@xxxxxxxxx> · Mon, 16 Oct 2023 14:57:23 +0300

The issue persists, although to a lesser extent. Any comments from the Ceph
team please?

/Z

On Fri, 13 Oct 2023 at 20:51, Zakhar Kirpichenko <zakhar@xxxxxxxxx> wrote:

> > Some of it is transferable to RocksDB on mons nonetheless.
>
> Please point me to relevant Ceph documentation, i.e. a description of how
> various Ceph monitor and RocksDB tunables affect the operations of
> monitors, I'll gladly look into it.
>
> > Please point me to such recommendations, if they're on docs.ceph.com I'll
> get them updated.
>
> This are the recommendations we used when we built our Pacific cluster:
> https://docs.ceph.com/en/pacific/start/hardware-recommendations/
>
> Our drives are 4x times larger than recommended by this guide. The drives
> are rated for < 0.5 DWPD, which is more than sufficient for boot drives and
> storage of rarely modified files. It is not documented or suggested
> anywhere that monitor processes write several hundred gigabytes of data per
> day, exceeding the amount of data written by OSDs. Which is why I am not
> convinced that what we're observing is expected behavior, but it's not easy
> to get a definitive answer from the Ceph community.
>
> /Z
>
> On Fri, 13 Oct 2023 at 20:35, Anthony D'Atri <anthony.datri@xxxxxxxxx>
> wrote:
>
>> Some of it is transferable to RocksDB on mons nonetheless.
>>
>> but their specs exceed Ceph hardware recommendations by a good margin
>>
>>
>> Please point me to such recommendations, if they're on docs.ceph.com I'll
>> get them updated.
>>
>> On Oct 13, 2023, at 13:34, Zakhar Kirpichenko <zakhar@xxxxxxxxx> wrote:
>>
>> Thank you, Anthony. As I explained to you earlier, the article you had
>> sent is about RocksDB tuning for Bluestore OSDs, while the issue at hand is
>> not with OSDs but rather monitors and their RocksDB store. Indeed, the
>> drives are not enterprise-grade, but their specs exceed Ceph hardware
>> recommendations by a good margin, they're being used as boot drives only
>> and aren't supposed to be written to continuously at high rates - which is
>> what unfortunately is happening. I am trying to determine why it is
>> happening and how the issue can be alleviated or resolved, unfortunately
>> monitor RocksDB usage and tunables appear to be not documented at all.
>>
>> /Z
>>
>> On Fri, 13 Oct 2023 at 20:11, Anthony D'Atri <anthony.datri@xxxxxxxxx>
>> wrote:
>>
>>> cf. Mark's article I sent you re RocksDB tuning.  I suspect that with
>>> Reef you would experience fewer writes.  Universal compaction might also
>>> help, but in the end this SSD is a client SKU and really not suited for
>>> enterprise use.  If you had the 1TB SKU you'd get much longer life, or you
>>> could change the overprovisioning on the ones you have.
>>>
>>> On Oct 13, 2023, at 12:30, Zakhar Kirpichenko <zakhar@xxxxxxxxx> wrote:
>>>
>>> I would very much appreciate it if someone with a better understanding of
>>> monitor internals and use of RocksDB could please chip in.
>>>
>>>
>>>
>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx