Re: backfilling kills rbd performance

Frank Schilder <frans@xxxxxx> · Sun, 20 Nov 2022 10:46:15 +0000

Hi Martin,

did you change from rep=2+min_size=1 to rep=3+min_size=2 in one go? I'm wondering if the missing extra shard could case PGs going to read-only occasionally. Maybe do min_size=1 until all PGs have 3 shards and then set min_size=2.

You can set recovery_sleep to a non-zero value. It is zero by default, which means recovery can take over all IO. We set it to a small number between 0.0025 and 0.05, depending on drive performance. The way we tuned it was to have a massive backfill operation going on, and then:

- set osd_recovery_sleep to 0 and take note of average recovery througput
- increase osd_recovery_sleep until ca. 30-50% of IO capacity are used by recovery

Then, the remaining IO capacity is guaranteed to be available for clients. This works really well in our set-up.

Something specific about quincy is the use of mclock scheduler. You can try to set it back to wpq or look at high-client IO profiles.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: martin.konold@xxxxxxxxxx <martin.konold@xxxxxxxxxx> on behalf of Konold, Martin <martin.konold@xxxxxxxxxx>
Sent: 19 November 2022 18:06:54
To: ceph-users@xxxxxxx
Subject:  Re: backfilling kills rbd performance

Hi,

On 2022-11-19 17:32, Anthony D'Atri wrote:
> I’m not positive that the options work with hyphens in them.  Try
>
> ceph tell osd.* injectargs '--osd_max_backfills 1
> --osd_recovery_max_active 1 --osd_recovery_max_single_start 1
> --osd_recovery_op_priority=1'

Did so.

> With Quincy the following should already be set, but to be sure:
>
> ceph tell osd.* config set osd_op_queue_cut_off high

Did so too and even restarted all osd as it was recommended.

I then stopped a single osd in order to cause some backfilling.

> What is network saturation like on that 1GE replication network?

Typically 100% saturated.

> Operations like yours that cause massive data movement could easily
> saturate a pipe that narrow.

Sure, but I am used to other setups where the recovery can be slowed
down in order to keep the rbds operating.

To me it looks like all backfilling happens in parallel without any
pauses in between which would benefit the client traffic.

I would expect some of those pgs in
active+undersized+degraded+remapped+backfill_wait state instead of
backfilling.

2022-11-19T16:58:50.139390+0000 mgr.pve-02 (mgr.18134134) 61735 :
cluster [DBG] pgmap v60978: 576 pgs: 102
active+undersized+degraded+remapped+backfilling, 474 active+clean; 2.4
TiB data, 4.3 TiB used, 10 TiB / 15 TiB avail; 150 KiB/s wr, 10 op/s;
123337/1272524 objects degraded (9.692%); 228 MiB/s, 58 objects/s
recovering

Is this Quincy specific?

Regards
--martin
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx