Hello, On Thu, 28 Sep 2017 22:36:22 +0000 Gregory Farnum wrote: > Also, realize the deep scrub interval is a per-PG thing and (unfortunately) > the OSD doesn't use a global view of its PG deep scrub ages to try and > schedule them intelligently across that time. If you really want to try and > force this out, I believe a few sites have written scripts to do it by > turning off deep scrubs, forcing individual PGs to deep scrub at intervals, > and then enabling deep scrubs again. > -Greg > This approach works best and w/o surprises down the road if osd_scrub_interval_randomize_ratio is disabled. And the osd_scrub_start_hour and osd_scrub_end_hour set to your needs. I basically kick the deep scrubs off on a per OSD basis (one at a time and staggered of course) and if your cluster is small/fast enough that pattern will be retained indefinitely, with only one PG doing a deep scrub at any given time (with the default max scrub of 1 of course). Christian > On Wed, Sep 27, 2017 at 6:34 AM David Turner <drakonstein@xxxxxxxxx> wrote: > > > This isn't an answer, but a suggestion to try and help track it down as > > I'm not sure what the problem is. Try querying the admin socket for your > > osds and look through all of their config options and settings for > > something that might explain why you have multiple deep scrubs happening on > > a single osd at the same time. > > > > However if you misspoke and only have 1 deep scrub per osd but multiple > > people node, then what you are seeing is expected behavior. I believe that > > luminous added a sleep seeing for scrub io that also might help. Looking > > through the admin socket dump of settings looking for scrub should give you > > some ideas of things to try. > > > > On Tue, Sep 26, 2017, 2:04 PM J David <j.david.lists@xxxxxxxxx> wrote: > > > >> With “osd max scrubs” set to 1 in ceph.conf, which I believe is also > >> the default, at almost all times, there are 2-3 deep scrubs running. > >> > >> 3 simultaneous deep scrubs is enough to cause a constant stream of: > >> > >> mon.ceph1 [WRN] Health check update: 69 slow requests are blocked > 32 > >> sec (REQUEST_SLOW) > >> > >> This seems to correspond with all three deep scrubs hitting the same > >> OSD at the same time, starving out all other I/O requests for that > >> OSD. But it can happen less frequently and less severely with two or > >> even one deep scrub running. Nonetheless, consumers of the cluster > >> are not thrilled with regular instances of 30-60 second disk I/Os. > >> > >> The cluster is five nodes, 15 OSDs, and there is one pool with 512 > >> placement groups. The cluster is running: > >> > >> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous > >> (rc) > >> > >> All of the OSDs are bluestore, with HDD storage and SSD block.db. > >> > >> Even setting “osd deep scrub interval = 1843200” hasn’t resolved this > >> issue, though it seems to get the number down from 3 to 2, which at > >> least cuts down on the frequency of requests stalling out. With 512 > >> pgs, that should mean that one pg gets deep-scrubbed per hour, and it > >> seems like a deep-scrub takes about 20 minutes. So what should be > >> happening is that 1/3rd of the time there should be one deep scrub, > >> and 2/3rds of the time there shouldn’t be any. Yet instead we have > >> 2-3 deep scrubs running at all times. > >> > >> Looking at “ceph pg dump” shows that about 7 deep scrubs get launched per > >> hour: > >> > >> $sudo ceph pg dump | fgrep active | awk ‘{print$23” “$24" "$1}' | > >> fgrep 2017-09-26 | sort -rn | head -22 > >> dumped all > >> 2017-09-26 16:42:46.781761 0.181 > >> 2017-09-26 16:41:40.056816 0.59 > >> 2017-09-26 16:39:26.216566 0.9e > >> 2017-09-26 16:26:43.379806 0.19f > >> 2017-09-26 16:24:16.321075 0.60 > >> 2017-09-26 16:08:36.095040 0.134 > >> 2017-09-26 16:03:33.478330 0.b5 > >> 2017-09-26 15:55:14.205885 0.1e2 > >> 2017-09-26 15:54:31.413481 0.98 > >> 2017-09-26 15:45:58.329782 0.71 > >> 2017-09-26 15:34:51.777681 0.1e5 > >> 2017-09-26 15:32:49.669298 0.c7 > >> 2017-09-26 15:01:48.590645 0.1f > >> 2017-09-26 15:01:00.082014 0.199 > >> 2017-09-26 14:45:52.893951 0.d9 > >> 2017-09-26 14:43:39.870689 0.140 > >> 2017-09-26 14:28:56.217892 0.fc > >> 2017-09-26 14:28:49.665678 0.e3 > >> 2017-09-26 14:11:04.718698 0.1d6 > >> 2017-09-26 14:09:44.975028 0.72 > >> 2017-09-26 14:06:17.945012 0.8a > >> 2017-09-26 13:54:44.199792 0.ec > >> > >> What’s going on here? > >> > >> Why isn’t the limit on scrubs being honored? > >> > >> It would also be great if scrub I/O were surfaced in “ceph status” the > >> way recovery I/O is, especially since it can have such a significant > >> impact on client operations. > >> > >> Thanks! > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@xxxxxxxxxxxxxx > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Rakuten Communications _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com