Re: Prioritise recovery on specific PGs/OSDs?

David Turner <drakonstein@xxxxxxxxx> · Tue, 20 Jun 2017 14:58:12 +0000

Setting an osd to 0.0 in the crush map will tell all PGs to move off of the osd. It's right the same as removing the osd from the closer, except it allows the osd to help move the data that it has and prevents having degraded PGs and objects while you do it. The limit to weighting osds to 0.0 is how full your cluster and remaining osds will be when the 0.0 osds are empty.

On Tue, Jun 20, 2017, 10:29 AM Peter Maloney <peter.maloney@xxxxxxxxxxxxxxxxxxxx> wrote:

    these settings are on a specific OSD:

      osd recovery max active = 1

        osd max backfills = 1

      I don't know if it will behave as you expect if you set 0... (I
      tested setting 0 which didn't complain, but is 0 actually 0 or
      unlimited or an error?)

      Maybe you could parse the ceph pg dump, then look at the pgs that
      list your special osds, then set all of the listed osds (not just
      special ones) config to 1 and the rest 0. But this will not
      prioritize specific pgs... or even specific osds, and maybe it'll
      end up being all osds.

      To further add to your criteria, you could select ones where the
      direction of movement is how you want it... like if up (where
      CRUSH wants the data after recovery is done) says [1,2,3] and
      acting (where it is now, even partial pgs I think) says [1,2,7]
      and you want to empty 7, then you have to set the numbers non-zero
      for osd 3 and 7, but maybe not 1 or 2 (although these could be
      read as part of recovery).

      I'm sure it's doomed to fail, but you can try it out on a test
      cluster.

      My guess is it will either not accept 0 like you expect, or it
      will only be a small fraction of your osds that you can set to 0.

      On 06/20/17 14:44, Richard Hesketh wrote:

      Is there a way, either by individual PG or by OSD, I can prioritise backfill/recovery on a set of PGs which are currently particularly important to me?

For context, I am replacing disks in a 5-node Jewel cluster, on a node-by-node basis - mark out the OSDs on a node, wait for them to clear, replace OSDs, bring up and in, mark out the OSDs on the next set, etc. I've done my first node, but the significant CRUSH map changes means most of my data is moving. I only currently care about the PGs on my next set of OSDs to replace - the other remapped PGs I don't care about settling because they're only going to end up moving around again after I do the next set of disks. I do want the PGs specifically on the OSDs I am about to replace to backfill because I don't want to compromise data integrity by downing them while they host active PGs. If I could specifically prioritise the backfill on those PGs/OSDs, I could get on with replacing disks without worrying about causing degraded PGs.

I'm in a situation right now where there is merely a couple of dozen PGs on the disks I want to replace, which are all remapped and waiting to backfill - but there are 2200 other PGs also waiting to backfill because they've moved around too, and it's extremely frustating to be sat waiting to see when the ones I care about will finally be handled so I can get on with replacing those disks.

Rich

      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 

--------------------------------------------
Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.maloney@xxxxxxxxxxxxxxxxxxxx
Internet: http://www.brockmann-consult.de
--------------------------------------------

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com