Re: Openstack VM IOPS drops dramatically during Ceph recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



My Ceph version is Luminuous 12.2.12. Do you think should i upgrade to Nautilus, or will Nautilus have a better control of recovery/backfilling?

best regards,

Samuel


huxiaoyu@xxxxxxxxxxxx
 
From: Robert LeBlanc
Date: 2019-10-14 16:27
To: huxiaoyu@xxxxxxxxxxxx
CC: ceph-users
Subject: Re: [ceph-users] Openstack VM IOPS drops dramatically during Ceph recovery
On Thu, Oct 10, 2019 at 2:23 PM huxiaoyu@xxxxxxxxxxxx
<huxiaoyu@xxxxxxxxxxxx> wrote:
>
> Hi, folks,
>
> I have a middle-size Ceph cluster as cinder backup for openstack (queens). Duing testing, one Ceph node went down unexpected and powered up again ca 10 minutes later, Ceph cluster starts PG recovery. To my surprise,  VM IOPS drops dramatically during Ceph recovery, from ca. 13K IOPS to about 400, a factor of 1/30, and I did put a stringent throttling on backfill and recovery, with the following ceph parameters
>
>     osd_max_backfills = 1
>     osd_recovery_max_active = 1
>     osd_client_op_priority=63
>     osd_recovery_op_priority=1
>     osd_recovery_sleep = 0.5
>
> The most weird thing is,
> 1) when there is no IO activity from any VM (ALL VMs are quiet except the recovery IO), the recovery bandwidth is ca. 10MiB/s, 2 objects/s. Seems like recovery throttle setting is working properly
> 2) when using FIO testing inside a VM, the recovery bandwith is going up quickly, reaching above 200MiB/s, 60 objects/s. FIO IOPS performance inside VM, however, is only at 400 IOPS/s (8KiB block size), around 3MiB/s. Obvious recovery throttling DOES NOT work properly
> 3) If i stop the FIO testing in VM, the recovery bandwith then goes down to  10MiB/s, 2 objects/s again, strange enough.
>
> How can this weird behavior happen? I just wonder, is there a method to configure recovery bandwith to a specific value, or the number of recovery objects per second? this may give better control of bakcfilling/recovery, instead of the faulty logic or relative osd_client_op_priority vs osd_recovery_op_priority.
>
> any ideas or suggests to make the recovery under control?
>
> best regards,
>
> Samuel
 
Not sure which version of Ceph you are on, but add these to your
/etc/ceph/ceph.conf on all your OSDs and restart them.
 
osd op queue = wpq
osd op queue cut off = high
 
That should really help and make backfills and recovery be
non-impactful. This will be the default in Octopus.
 
--------------------------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux