Prioritized pool recovery

Kyle Brantley <kyle@xxxxxxxxxxxxxx> · Sun, 5 May 2019 11:40:09 -0600

I've been running luminous / ceph-12.2.11-0.el7.x86_64 on CentOS 7 for about a month now, and had a few times when I've needed to recreate the OSDs on a server. (no I'm not planning on routinely doing this...)

What I've noticed is that the recovery will generally stagger the recovery so that the pools on the cluster will finish around the same time (+/- a few hours). What I'm hoping to do is prioritize specific pools over others, so that ceph will recover all of pool 1 before it moves on to pool 2, for example.

In the docs, recovery_{,op}_priority both have roughly the same description, which is "the priority set for recovery operations" as well as a valid range of 1-63, default 5. This doesn't tell me if a value of 1 is considered a higher priority than 63, and it doesn't tell me how it fits in line with other ceph operations.

Questions:
1) If I have pools 1-4, what would I set these values to in order to backfill pools 1, 2, 3, and then 4 in order?
2) Assuming this is possible, how do I ensure that backfill isn't prioritized over client I/O?
3) Is there a command that enumerates the weights of the current operations (so that I can observe what's going on)?

For context, my pools are:
1) cephfs_metadata
2) vm (RBD pool, VM OS drives)
3) storage (RBD pool, VM data drives)
4) cephfs_data

These are sorted by both size (smallest to largest) and criticality of recovery (most to least). If there's a critique of this setup / a better way of organizing this, suggestions are welcome.

Thanks,
--Kyle
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com