Hmm, I didn't know we had this functionality before. It looks to be changing quite a lot at the moment, so be aware this will likely require reconfiguring later. On Sun, May 5, 2019 at 10:40 AM Kyle Brantley <kyle@xxxxxxxxxxxxxx> wrote: > > I've been running luminous / ceph-12.2.11-0.el7.x86_64 on CentOS 7 for about a month now, and had a few times when I've needed to recreate the OSDs on a server. (no I'm not planning on routinely doing this...) > > What I've noticed is that the recovery will generally stagger the recovery so that the pools on the cluster will finish around the same time (+/- a few hours). What I'm hoping to do is prioritize specific pools over others, so that ceph will recover all of pool 1 before it moves on to pool 2, for example. > > In the docs, recovery_{,op}_priority both have roughly the same description, which is "the priority set for recovery operations" as well as a valid range of 1-63, default 5. This doesn't tell me if a value of 1 is considered a higher priority than 63, and it doesn't tell me how it fits in line with other ceph operations. I'm not seeing this in the luminous docs, are you sure? The source code indicates in Luminous it's 0-254. (As I said, things have changed, so in the current master build it seems to be -10 to 10 and configured a bit differently.) The 1-63 values generally apply to op priorities within the OSD, and are used as part of a weighted priority queue when selecting the next op to work on out of those available; you may have been looking at osd_recovery_op_priority which is on that scale and should apply to individual recovery messages/ops but will not work to schedule PGs differently. > Questions: > 1) If I have pools 1-4, what would I set these values to in order to backfill pools 1, 2, 3, and then 4 in order? So if I'm reading the code right, they just need to be different weights, and the higher value will win when trying to get a reservation if there's a queue of them. (However, it's possible that lower-priority pools will send off requests first and get to do one or two PGs first, then the higher-priority pool will get to do all its work before that pool continues.) > 2) Assuming this is possible, how do I ensure that backfill isn't prioritized over client I/O? This is an ongoing issue but I don't think the pool prioritization will change the existing mechanisms. > 3) Is there a command that enumerates the weights of the current operations (so that I can observe what's going on)? "ceph osd pool ls detail" will include them. > > For context, my pools are: > 1) cephfs_metadata > 2) vm (RBD pool, VM OS drives) > 3) storage (RBD pool, VM data drives) > 4) cephfs_data > > These are sorted by both size (smallest to largest) and criticality of recovery (most to least). If there's a critique of this setup / a better way of organizing this, suggestions are welcome. > > Thanks, > --Kyle > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com