Re: deep-scrubbing has large impact on performance

Eugen Block <eblock@xxxxxx> · Tue, 22 Nov 2016 11:11:08 +0100

Thanks for the very quick answer!

If you are using Jewel

We are still using Hammer (0.94.7), we wanted to upgrade to Jewel in a  
couple of weeks, would you recommend to do it now?

Zitat von Nick Fisk <nick@xxxxxxxxxx>:

-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On  
Behalf Of Eugen Block
Sent: 22 November 2016 09:55
To: ceph-users@xxxxxxxxxxxxxx
Subject:  deep-scrubbing has large impact on performance

Hi list,

I've been searching the mail archive and the web for some help. I  
tried the things I found, but I can't see the effects. We use
Ceph for
our Openstack environment.

When our cluster (2 pools, each 4092 PGs, in 20 OSDs on 4 nodes, 3
MONs) starts deep-scrubbing, it's impossible to work with the VMs.
Currently, the deep-scrubs happen to start on Monday, which is  
unfortunate. I already plan to start the next deep-scrub on
Saturday,
so it has no impact on our work days. But if I imagine we had a  
large multi-datacenter, such performance breaks are not
reasonable. So
I'm wondering how do you guys manage that?

What I've tried so far:

ceph tell osd.* injectargs '--osd_scrub_sleep 0.1'
ceph tell osd.* injectargs '--osd_disk_thread_ioprio_priority 7'
ceph tell osd.* injectargs '--osd_disk_thread_ioprio_class idle'
ceph tell osd.* injectargs '--osd_scrub_begin_hour 0'
ceph tell osd.* injectargs '--osd_scrub_end_hour 7'

And I also added these options to the ceph.conf.
To be able to work again, I had to set the nodeep-scrub option and  
unset it when I left the office. Today, I see the cluster deep-
scrubbing again, but only one PG at a time, it seems that now the  
default for osd_max_scrubs is working now and I don't see major
impacts yet.

But is there something else I can do to reduce the performance impact?

If you are using Jewel, the scrubing is now done in the client IO  
thread, so those disk thread options won't do anything. Instead
there is a new priority setting, which seems to work for me, along  
with a few other settings.

osd_scrub_priority = 1
osd_scrub_sleep = .1
osd_scrub_chunk_min = 1
osd_scrub_chunk_max = 5
osd_scrub_load_threshold = 5

Also enabling the weighted priority queue can assist the new priority options

osd_op_queue = wpq

I just found [1] and will have a look into it.

[1] http://prob6.com/en/ceph-pg-deep-scrub-cron/

Thanks!
Eugen

--
Eugen Block                             voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG      fax     : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg                         e-mail  : eblock@xxxxxx

         Vorsitzende des Aufsichtsrates: Angelika Mozdzen
           Sitz und Registergericht: Hamburg, HRB 90934
                   Vorstand: Jens-U. Mozdzen
                    USt-IdNr. DE 814 013 983

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Eugen Block                             voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG      fax     : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg                         e-mail  : eblock@xxxxxx

        Vorsitzende des Aufsichtsrates: Angelika Mozdzen
          Sitz und Registergericht: Hamburg, HRB 90934
                  Vorstand: Jens-U. Mozdzen
                   USt-IdNr. DE 814 013 983

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com