Deepscrub IO impact on Jewel: What is osd_op_queue prio implementation?

Martin Millnert <martin@xxxxxxxxxxx> · Tue, 25 Apr 2017 21:04:36 +0200

Hi,

experiencing significant impact from deep scrubs on Jewel.
Started investigating OP priorities. We use default values on
related/relevant OSD priority settings.

"osd op queue" on
http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/#operations
states:  "The normal queue is different between implementations."

So... in Jewel, where except code can I learn what is the queue
behavior? Is there anyone who's familiar with it?

I'd like to understand if "prio" in Jewel is as explained, i.e.
something similar to the following pseudo code:

  if len(subqueue) > 0:
    dequeue(subqueue)
  if tokens(global) > some_cost:
    for queue in queues_high_to_low:
      if len(queue) > 0:
        dequeue(queue)
	tokens = tokens - some_other_cost
  else:
    for queue in queues_low_to_high:
      if len(queue) > 0:
        dequeue(queue)
	tokens = tokens - some_other_cost
  tokens = min(tokens + some_refill_rate, max_tokens)

The background, for anyone interested, is:

If it is similar to above, this would explain extreme OSD commit
latencies / client latency. My current theory is that the deep scrub
quite possibly is consuming all available tokens, such that when a
client op arrives, and priority(client_io) > priority([deep_]scrub), the
prio queue essentially inverts and low priority ops get priority over
high priority ops.

The OSD:s are SMR but the question here is specifically not how they
perform (we're quite intimately aware of their performance profiles),
but how to tame Ceph to make cluster behave as good as possible in
normal case.

I put up some graphs on https://martin.millnert.se/ceph/jewel_prio/ :
 - OSD Journal/Commit/Apply latencies show very strong correlation with
ongoing deep scrubs.
 - When latencies are low and noisy there's essentially no client IO
   happening.
 - There is some evidence the write latency shoots through the roof --
   but there isn't much client write occuring... Possible Deep Scrub
   causes disk write IO?
   * mount opts used are:
    [...] type xfs (rw,relatime,seclabel,attr2,inode64,noquota)

The objective is to increase servicing time of client IO, especially
read, while deep scrub is occuring. It doesn't matter for us if a
deep-scrub takes x or 3x time, essentially. More consistent latency
to clients is more important.

Best,
Martin Millnert
Attachment:
signature.asc

Description: PGP signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com