Re: Openstack VM IOPS drops dramatically during Ceph recovery

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Thu, 17 Oct 2019 13:29:01 -0700

On Thu, Oct 17, 2019 at 12:35 PM huxiaoyu@xxxxxxxxxxxx
<huxiaoyu@xxxxxxxxxxxx> wrote:
>
> hello, Robert
>
> thanks for the quick reply. I did test with  osd op queue = wpq , and     osd op queue cut off = high
> and
> osd_recovery_op_priority = 1
> osd recovery delay start = 20
> osd recovery max active = 1
> osd recovery max chunk = 1048576
> osd recovery sleep = 1
> osd recovery sleep hdd = 1
> osd recovery sleep ssd = 1
> osd recovery sleep hybrid = 1
> osd recovery priority = 1
> osd max backfills = 1
> osd backfill scan max = 16
> osd backfill scan min = 4
> osd_op_thread_suicide_timeout = 300
>
> But still the ceph cluster showed extremely hug recovery activities during the beginning of the recovery, and after ca. 5-10 minutes, the recovery gradually get under the control. I guess this is quite similar to what you encountered in Nov. 2015.
>
> It is really annoying, and what else can i do to mitigate this weird inital-recovery issue? any suggestions are much appreciated.

Hmm, on our Luminous cluster, we have the defaults other than the op
queue and cut off and bringing in a node is nearly zero impact for
client traffic. Those would need to be set on all OSDs to be
completely effective. Maybe go back to the defaults?

----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com