Hi, experiencing significant impact from deep scrubs on Jewel. Started investigating OP priorities. We use default values on related/relevant OSD priority settings. "osd op queue" on http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/#operations states: "The normal queue is different between implementations." So... in Jewel, where except code can I learn what is the queue behavior? Is there anyone who's familiar with it? I'd like to understand if "prio" in Jewel is as explained, i.e. something similar to the following pseudo code: if len(subqueue) > 0: dequeue(subqueue) if tokens(global) > some_cost: for queue in queues_high_to_low: if len(queue) > 0: dequeue(queue) tokens = tokens - some_other_cost else: for queue in queues_low_to_high: if len(queue) > 0: dequeue(queue) tokens = tokens - some_other_cost tokens = min(tokens + some_refill_rate, max_tokens) The background, for anyone interested, is: If it is similar to above, this would explain extreme OSD commit latencies / client latency. My current theory is that the deep scrub quite possibly is consuming all available tokens, such that when a client op arrives, and priority(client_io) > priority([deep_]scrub), the prio queue essentially inverts and low priority ops get priority over high priority ops. The OSD:s are SMR but the question here is specifically not how they perform (we're quite intimately aware of their performance profiles), but how to tame Ceph to make cluster behave as good as possible in normal case. I put up some graphs on https://martin.millnert.se/ceph/jewel_prio/ : - OSD Journal/Commit/Apply latencies show very strong correlation with ongoing deep scrubs. - When latencies are low and noisy there's essentially no client IO happening. - There is some evidence the write latency shoots through the roof -- but there isn't much client write occuring... Possible Deep Scrub causes disk write IO? * mount opts used are: [...] type xfs (rw,relatime,seclabel,attr2,inode64,noquota) The objective is to increase servicing time of client IO, especially read, while deep scrub is occuring. It doesn't matter for us if a deep-scrub takes x or 3x time, essentially. More consistent latency to clients is more important. Best, Martin Millnert
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com