Hello, On Thu, 20 Oct 2016 11:23:54 +0200 Oliver Dzombic wrote: > Hi, > > we have here globally: > > osd_client_op_priority = 63 > osd_disk_thread_ioprio_class = idle > osd_disk_thread_ioprio_priority = 7 > osd_max_scrubs = 1 > If you google for osd_max_scrubs you will find plenty of threads, bug reports, etc. The most significant and benificial impact for client I/O can be achieved by telling scrub to release its deadly grip on the OSDs with something like osd_scrub_sleep = 0.1 Also which version, Hammer IIRC? Jewel's unified queue should help as well, but no first hand experience here. > to influence the scrubbing performance and > > osd_scrub_begin_hour = 1 > osd_scrub_end_hour = 7 > > to influence the scrubbing time frame > > > Now, as it seems, this time frame is/was not enough, so ceph started > scrubbing all the time, i assume because of the age of the objects. > You may want to line things up, so that OSDs/PGs are evenly spread out. For example with 6 OSDs, manually initiate a deep scrub each day (at 01:00 in your case), so that only a specific subset is doing deep scrub conga. > And it does it with: > > 4 active+clean+scrubbing+deep > > ( instead of the configured 1 ) > That's per OSD, not global, see above, google. > > So now, we experience a situation, where the spinning drives are so > busy, that the IO performance got too bad. > > The only reason that its not a catastrophy is, that we have a cache tier > in front of it, which loweres the IO needs on the spnning drives. > > Unluckily we have also some pools going directly on the spinning drives. > > So these pools experience a very bad IO performance. > > So we had to disable scrubbing during business houres ( which is not > really a solution ). > It is, unfortunately, for many people. As mentioned many times, if your cluster is having issues with deep-scrubs during peak hours, it will also be unhappy if you loose an OSD and backfills happen. If it is unhappy with normal scrubs, you need to upgrade/expand HW immediately. > So any idea why > > 1. 4-5 scrubs we can see, while osd_max_scrubs = 1 is set ? See above. With BlueStore in the wings and reduced (negated?) need for deep-scrubs, I doubt this will see much coding effort. > 2. Why the impact on the spinning drives is so hard, while we lowered > the IO priority for it ? > That has only a small impact, deep-scrub by its very nature reads all objects and thus kills I/Os by seeks and polluting caches. Christian -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com