Re: osd max scrubs not honored?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This isn't an answer, but a suggestion to try and help track it down as I'm not sure what the problem is. Try querying the admin socket for your osds and look through all of their config options and settings for something that might explain why you have multiple deep scrubs happening on a single osd at the same time.

However if you misspoke and only have 1 deep scrub per osd but multiple people node, then what you are seeing is expected behavior.  I believe that luminous added a sleep seeing for scrub io that also might help.  Looking through the admin socket dump of settings looking for scrub should give you some ideas of things to try.


On Tue, Sep 26, 2017, 2:04 PM J David <j.david.lists@xxxxxxxxx> wrote:
With “osd max scrubs” set to 1 in ceph.conf, which I believe is also
the default, at almost all times, there are 2-3 deep scrubs running.

3 simultaneous deep scrubs is enough to cause a constant stream of:

mon.ceph1 [WRN] Health check update: 69 slow requests are blocked > 32
sec (REQUEST_SLOW)

This seems to correspond with all three deep scrubs hitting the same
OSD at the same time, starving out all other I/O requests for that
OSD.  But it can happen less frequently and less severely with two or
even one deep scrub running.  Nonetheless, consumers of the cluster
are not thrilled with regular instances of 30-60 second disk I/Os.

The cluster is five nodes, 15 OSDs, and there is one pool with 512
placement groups.  The cluster is running:

ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)

All of the OSDs are bluestore, with HDD storage and SSD block.db.

Even setting “osd deep scrub interval = 1843200” hasn’t resolved this
issue, though it seems to get the number down from 3 to 2, which at
least cuts down on the frequency of requests stalling out.  With 512
pgs, that should mean that one pg gets deep-scrubbed per hour, and it
seems like a deep-scrub takes about 20 minutes.  So what should be
happening is that 1/3rd of the time there should be one deep scrub,
and 2/3rds of the time there shouldn’t be any.  Yet instead we have
2-3 deep scrubs running at all times.

Looking at “ceph pg dump” shows that about 7 deep scrubs get launched per hour:

$sudo ceph pg dump | fgrep active | awk ‘{print$23” “$24" "$1}' |
fgrep 2017-09-26 | sort -rn | head -22
dumped all
2017-09-26 16:42:46.781761 0.181
2017-09-26 16:41:40.056816 0.59
2017-09-26 16:39:26.216566 0.9e
2017-09-26 16:26:43.379806 0.19f
2017-09-26 16:24:16.321075 0.60
2017-09-26 16:08:36.095040 0.134
2017-09-26 16:03:33.478330 0.b5
2017-09-26 15:55:14.205885 0.1e2
2017-09-26 15:54:31.413481 0.98
2017-09-26 15:45:58.329782 0.71
2017-09-26 15:34:51.777681 0.1e5
2017-09-26 15:32:49.669298 0.c7
2017-09-26 15:01:48.590645 0.1f
2017-09-26 15:01:00.082014 0.199
2017-09-26 14:45:52.893951 0.d9
2017-09-26 14:43:39.870689 0.140
2017-09-26 14:28:56.217892 0.fc
2017-09-26 14:28:49.665678 0.e3
2017-09-26 14:11:04.718698 0.1d6
2017-09-26 14:09:44.975028 0.72
2017-09-26 14:06:17.945012 0.8a
2017-09-26 13:54:44.199792 0.ec

What’s going on here?

Why isn’t the limit on scrubs being honored?

It would also be great if scrub I/O were surfaced in “ceph status” the
way recovery I/O is, especially since it can have such a significant
impact on client operations.

Thanks!
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux