Re: Deepscrub IO impact on Jewel: What is osd_op_queue prio implementation?

Gregory Farnum <gfarnum@xxxxxxxxxx> · Tue, 25 Apr 2017 15:39:42 -0400

On Tue, Apr 25, 2017 at 3:04 PM, Martin Millnert <martin@xxxxxxxxxxx> wrote:
> Hi,
>
> experiencing significant impact from deep scrubs on Jewel.
> Started investigating OP priorities. We use default values on
> related/relevant OSD priority settings.
>
> "osd op queue" on
> http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/#operations
> states:  "The normal queue is different between implementations."
>
> So... in Jewel, where except code can I learn what is the queue
> behavior? Is there anyone who's familiar with it?
>
>
> I'd like to understand if "prio" in Jewel is as explained, i.e.
> something similar to the following pseudo code:
>
>   if len(subqueue) > 0:
>     dequeue(subqueue)
>   if tokens(global) > some_cost:
>     for queue in queues_high_to_low:
>       if len(queue) > 0:
>         dequeue(queue)
>         tokens = tokens - some_other_cost
>   else:
>     for queue in queues_low_to_high:
>       if len(queue) > 0:
>         dequeue(queue)
>         tokens = tokens - some_other_cost
>   tokens = min(tokens + some_refill_rate, max_tokens)

That looks about right.

>
>
>
> The background, for anyone interested, is:
>
> If it is similar to above, this would explain extreme OSD commit
> latencies / client latency. My current theory is that the deep scrub
> quite possibly is consuming all available tokens, such that when a
> client op arrives, and priority(client_io) > priority([deep_]scrub), the
> prio queue essentially inverts and low priority ops get priority over
> high priority ops.
>
> The OSD:s are SMR but the question here is specifically not how they
> perform (we're quite intimately aware of their performance profiles),
> but how to tame Ceph to make cluster behave as good as possible in
> normal case.
>
> I put up some graphs on https://martin.millnert.se/ceph/jewel_prio/ :
>  - OSD Journal/Commit/Apply latencies show very strong correlation with
> ongoing deep scrubs.
>  - When latencies are low and noisy there's essentially no client IO
>    happening.
>  - There is some evidence the write latency shoots through the roof --
>    but there isn't much client write occuring... Possible Deep Scrub
>    causes disk write IO?
>    * mount opts used are:
>     [...] type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
>
> The objective is to increase servicing time of client IO, especially
> read, while deep scrub is occuring. It doesn't matter for us if a
> deep-scrub takes x or 3x time, essentially. More consistent latency
> to clients is more important.

I don't have any experience with SMR drives so it wouldn't surprise me
if there are some exciting emergent effects with them. But it sounds
to me like you want to start by adjusting the osd_scrub_priority
(default 5) and osd_scrub_cost (default 50 << 20, ie 50MB). That will
directly impact how they move through the queue in relation to client
ops. (There are also the family of scrub scheduling options, which
might make sense if you are more tolerant of slow IO at certain times
of the day/week, but I'm not familiar with them).
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com