On Tue, Apr 25, 2017 at 3:04 PM, Martin Millnert <martin@xxxxxxxxxxx> wrote: > Hi, > > experiencing significant impact from deep scrubs on Jewel. > Started investigating OP priorities. We use default values on > related/relevant OSD priority settings. > > "osd op queue" on > http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/#operations > states: "The normal queue is different between implementations." > > So... in Jewel, where except code can I learn what is the queue > behavior? Is there anyone who's familiar with it? > > > I'd like to understand if "prio" in Jewel is as explained, i.e. > something similar to the following pseudo code: > > if len(subqueue) > 0: > dequeue(subqueue) > if tokens(global) > some_cost: > for queue in queues_high_to_low: > if len(queue) > 0: > dequeue(queue) > tokens = tokens - some_other_cost > else: > for queue in queues_low_to_high: > if len(queue) > 0: > dequeue(queue) > tokens = tokens - some_other_cost > tokens = min(tokens + some_refill_rate, max_tokens) That looks about right. > > > > The background, for anyone interested, is: > > If it is similar to above, this would explain extreme OSD commit > latencies / client latency. My current theory is that the deep scrub > quite possibly is consuming all available tokens, such that when a > client op arrives, and priority(client_io) > priority([deep_]scrub), the > prio queue essentially inverts and low priority ops get priority over > high priority ops. > > The OSD:s are SMR but the question here is specifically not how they > perform (we're quite intimately aware of their performance profiles), > but how to tame Ceph to make cluster behave as good as possible in > normal case. > > I put up some graphs on https://martin.millnert.se/ceph/jewel_prio/ : > - OSD Journal/Commit/Apply latencies show very strong correlation with > ongoing deep scrubs. > - When latencies are low and noisy there's essentially no client IO > happening. > - There is some evidence the write latency shoots through the roof -- > but there isn't much client write occuring... Possible Deep Scrub > causes disk write IO? > * mount opts used are: > [...] type xfs (rw,relatime,seclabel,attr2,inode64,noquota) > > The objective is to increase servicing time of client IO, especially > read, while deep scrub is occuring. It doesn't matter for us if a > deep-scrub takes x or 3x time, essentially. More consistent latency > to clients is more important. I don't have any experience with SMR drives so it wouldn't surprise me if there are some exciting emergent effects with them. But it sounds to me like you want to start by adjusting the osd_scrub_priority (default 5) and osd_scrub_cost (default 50 << 20, ie 50MB). That will directly impact how they move through the queue in relation to client ops. (There are also the family of scrub scheduling options, which might make sense if you are more tolerant of slow IO at certain times of the day/week, but I'm not familiar with them). -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com