Re: Performance improvement suggestion

Frank Schilder <frans@xxxxxx> · Mon, 4 Mar 2024 13:37:31 +0000

>>> Fast write enabled would mean that the primary OSD sends #size copies to the
>>> entire active set (including itself) in parallel and sends an ACK to the
>>> client as soon as min_size ACKs have been received from the peers (including
>>> itself). In this way, one can tolerate (size-min_size) slow(er) OSDs (slow
>>> for whatever reason) without suffering performance penalties immediately
>>> (only after too many requests started piling up, which will show as a slow
>>> requests warning).
>>>
>> What happens if there occurs an error on the slowest osd after the min_size ACK has already been send to the client?
>>
>This should not be different than what exists today..unless of-course if
>the error happens on the local/primary osd

Can this be addressed with reasonable effort? I don't expect this to be a quick-fix and it should be tested. However, beating the tail-latency statistics with the extra redundancy should be worth it. I observe fluctuations of latencies, OSDs become randomly slow for whatever reason for short time intervals and then return to normal.

A reason for this could be DB compaction. I think during compaction latency tends to spike.

A fast-write option would effectively remove the impact of this.

Best regards and thanks for considering this!
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx