Re: Reweight 0 - best way to backfill slowly?

David Majchrzak <david@xxxxxxxxxx> · Mon, 29 Jan 2018 23:14:12 +0100

Thanks Steve!

So the peering won't actually move any blocks around, but will make sure that all PGs know what state they are in? That means that when I start increasing reweight, PGs will be allocated to the disk, but won't actually recover yet. However, they will be set as "degraded".
So when all of the peering is done, I'll unset the norecover/nobackfill flags and backfill will commence but will be less I/O intensive than peering and backfilling at the same time?

Kind Regards,

David Majchrzak

29 jan. 2018 kl. 22:57 skrev Steve Taylor <steve.taylor@xxxxxxxxxxxxxxxx>:

There are two concerns with setting the reweight to 1.0. The first is peering and the second is backfilling. Peering is going to block client I/O on the affected OSDs, while backfilling will only potentially slow things down.

I don't know what your client I/O looks like, but personally I would probably set the norecover and nobackfill flags, slowly increment your reweight value by 0.01 or whatever you deem to be appropriate for your environment, waiting for peering to complete
 in between each step. Also allow any resulting blocked requests to clear up before incrementing your reweight again.

When your reweight is all the way up to 1.0, inject osd_max_backfills to whatever you like (or don't if you're happy with it as is) and unset the norecover and nobackfill flags to let backfilling begin. If you are unable to handle the impact of backfilling
 with osd_max_backfills set to 1, then you need to add some new OSDs to your cluster before doing any of this. They will have to backfill too, but at least you'll have more spindles to handle it.

    <SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg>
    Steve Taylor | Senior Software Engineer 
      | StorageCraft Technology Corporation
380 Data Drive Suite 300 
      | Draper | Utah | 84020
Office: 801.871.2799 | 

    If you are not the intended recipient of 
      this message or received it erroneously, please notify the sender and 
      delete it, together with any attachments, and be advised that any 
      dissemination or copying of this message is 
  prohibited.

On Mon, 2018-01-29 at 22:43 +0100, David Majchrzak wrote:

And so I totally forgot to add df tree to the mail.
Here's the interesting bit from two first nodes. where osd.11 has weight but is reweighted to 0.

root@osd1:~# ceph osd df tree
ID WEIGHT    REWEIGHT SIZE   USE    AVAIL  %USE  VAR  TYPE NAME
-1 181.99997        -   109T 50848G 60878G     0    0 root default
-2  36.39999        - 37242G 16792G 20449G 45.09 0.99     host osd1
 0   3.64000  1.00000  3724G  1730G  1993G 46.48 1.02         osd.0
 1   3.64000  1.00000  3724G  1666G  2057G 44.75 0.98         osd.1
 2   3.64000  1.00000  3724G  1734G  1989G 46.57 1.02         osd.2
 3   3.64000  1.00000  3724G  1387G  2336G 37.25 0.82         osd.3
 4   3.64000  1.00000  3724G  1722G  2002G 46.24 1.01         osd.4
 6   3.64000  1.00000  3724G  1840G  1883G 49.43 1.08         osd.6
 7   3.64000  1.00000  3724G  1651G  2072G 44.34 0.97         osd.7
 8   3.64000  1.00000  3724G  1747G  1976G 46.93 1.03         osd.8
 9   3.64000  1.00000  3724G  1697G  2026G 45.58 1.00         osd.9
 5   3.64000  1.00000  3724G  1614G  2109G 43.34 0.95         osd.5
-3  36.39999        -      0      0      0     0    0     host osd2
12   3.64000  1.00000  3724G  1730G  1993G 46.46 1.02         osd.12
13   3.64000  1.00000  3724G  1745G  1978G 46.88 1.03         osd.13
14   3.64000  1.00000  3724G  1707G  2016G 45.84 1.01         osd.14
15   3.64000  1.00000  3724G  1540G  2184G 41.35 0.91         osd.15
16   3.64000  1.00000  3724G  1484G  2239G 39.86 0.87         osd.16
18   3.64000  1.00000  3724G  1928G  1796G 51.77 1.14         osd.18
20   3.64000  1.00000  3724G  1767G  1956G 47.45 1.04         osd.20
10   3.64000  1.00000  3724G  1797G  1926G 48.27 1.06         osd.10
49   3.64000  1.00000  3724G  1847G  1877G 49.60 1.09         osd.49
11   3.64000        0      0      0      0     0    0         osd.11

29 jan. 2018 kl. 22:40 skrev David Majchrzak <david@xxxxxxxxxx>:

Hi!

Cluster: 5 HW nodes, 10 HDDs with SSD journals, filestore, 0.94.9 hammer, debian wheezy (scheduled to upgrade once this is fixed).

I have a replaced HDD that another admin set to reweight 0 instead of weight 0 (I can't remember the reason).
What would be the best way to slowly backfill it? Usually I'm using weight and slowly growing it to max size.

I guess if I just set reweight to 1.0, it will backfill as fast as I let it, that is max 1 backfill / osd but it will probably disrupt client io (this being on hammer).

And if I set the weight on it to 0, the node will get less weight, and will start moving data around everywhere right?

Can I use reweight the same way as weight here, slowly increasing it up to 1.0 by increments of say 0.01?

Kind Regards,
David Majchrzak

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com