Re: Prioritized pool recovery

Gregory Farnum <gfarnum@xxxxxxxxxx> · Wed, 8 May 2019 16:29:01 -0700

On Mon, May 6, 2019 at 6:41 PM Kyle Brantley <kyle@xxxxxxxxxxxxxx> wrote:
>
> On 5/6/2019 6:37 PM, Gregory Farnum wrote:
> > Hmm, I didn't know we had this functionality before. It looks to be
> > changing quite a lot at the moment, so be aware this will likely
> > require reconfiguring later.
>
> Good to know, and not a problem. In any case, I'd assume it won't change substantially for luminous, correct?
>
>
> > I'm not seeing this in the luminous docs, are you sure? The source
>
> You're probably right, but there are options for this in luminous:
>
> # ceph osd pool get vm
> Invalid command: missing required parameter var([...] recovery_priority|recovery_op_priority [...])
>
>
> > code indicates in Luminous it's 0-254. (As I said, things have
> > changed, so in the current master build it seems to be -10 to 10 and
> > configured a bit differently.)
>
> > The 1-63 values generally apply to op priorities within the OSD, and
> > are used as part of a weighted priority queue when selecting the next
> > op to work on out of those available; you may have been looking at
> > osd_recovery_op_priority which is on that scale and should apply to
> > individual recovery messages/ops but will not work to schedule PGs
> > differently.
>
> So I was probably looking at the OSD level then.

Ah sorry, I looked at the recovery_priority option and skipped
recovery_op_priority entirely.

So recovery_op_priority sets the priority on the message dispatch
itself and is on the 0-63 scale. I wouldn't mess around with that; the
higher you put it the more of them will be dispatched compared to
client operations.

>
> >
> >> Questions:
> >> 1) If I have pools 1-4, what would I set these values to in order to backfill pools 1, 2, 3, and then 4 in order?
> >
> > So if I'm reading the code right, they just need to be different
> > weights, and the higher value will win when trying to get a
> > reservation if there's a queue of them. (However, it's possible that
> > lower-priority pools will send off requests first and get to do one or
> > two PGs first, then the higher-priority pool will get to do all its
> > work before that pool continues.)
>
> Where higher is 0, or higher is 254? And what's the difference between recovery_priority and recovery_op_priority?

For recovery_priority larger numbers are higher. When picking a PG off
the list of pending reservations, it will take the highest priority PG
it sees, and the first request to come in within that priority.

>
> In reading the docs for the OSD, _op_ is "priority set for recovery operations," and non-op is "priority set for recovery work queue." For someone new to ceph such as myself, this reads like the same thing at a glance. Would the recovery operations not be a part of the work queue?
>
> And would this apply the same for the pools?

When a PG needs to recover, it has to acquire a reservation slot on
the local and remote nodes (to limit the total amount of work being
done). It sends off a request and when the total number of
reservations is hit, they go into a pending queue. The
recovery_priority orders that queue.

>
> >
> >> 2) Assuming this is possible, how do I ensure that backfill isn't prioritized over client I/O?
> >
> > This is an ongoing issue but I don't think the pool prioritization
> > will change the existing mechanisms.
>
> Okay, understood. Not a huge problem, I'm primarily looking for understanding.
>
>
> >> 3) Is there a command that enumerates the weights of the current operations (so that I can observe what's going on)?
> >
> > "ceph osd pool ls detail" will include them.
> >
>
> Perfect!
>
> Thank you very much for the information. Once I have a little more, I'm probably going to work towards sending a pull request in for the docs...
>
>
> --Kyle
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com