Re: PG_BACKFILL_FULL

Iztok Gregori <iztok.gregori@xxxxxxxxxx> · Tue, 17 Jan 2023 08:39:24 +0100

Thank for your response and advice.

On 16/01/23 15:17, Boris Behrens wrote:
Hmm.. I ran into some similar issue.

IMHO there are two ways to work around the problem until the new disk in 
place:
1. change the backfill full threshold (I use these commands: 
https://www.suse.com/support/kb/doc/?id=000019724 
<https://www.suse.com/support/kb/doc/?id=000019724>)

If I understand correctly the "backfillfull_ratio" is a ratio above 
which a warning is triggered and the cluster will deny backfilling to 
the OSD in question. But my OSD (87.53% ) is not above the ratio (90%). 
Granted, it is possible that "after" the 3 PGs are moved to that OSD the 
ratio will be crossed, but right now we are bellow.

My final goal is to empty the "damaged" OSD/disk, replace it and fill it 
up again. But I can't do that because it has still 3 PGs on it.

2. reweight the backfill full OSDs just a little bit, so they move data 
to disks that are free enough (i.e. `ceph osd reweight osd.60 0.9`) - if 
you have enough capacity in the cluster (577+ OSDs should be able to 
take that :) )

You mean first reweight to (something like) .9, let the OSD free up 
space and then put it back to 1 so the PGs can be backfilled?

Idally I would like just to manually set the new "location" of the PGs 
away from the nearfull OSD.60. I see there are some commands called 
"ceph osd pg-upmap" and "ceph osd pg-upmap-items" which could be the 
right tool for what I want to achieve. But I didn't found a lot of 
information about it, sombody knows something more, are those tools 
"safe" to run in my case?

Thanks a lot
Iztok

Cheers
  Boris

Am Mo., 16. Jan. 2023 um 15:01 Uhr schrieb Iztok Gregori 
<iztok.gregori@xxxxxxxxxx <mailto:iztok.gregori@xxxxxxxxxx>>:

    Hi to all!

    We are in a situation where we have 3 PG in
    "active+remapped+backfill_toofull". It happened when we executed a
    "gentle-reweight" to zero of one OSD (osd.77) to swap it with a new one
    (the current one registered some read errors and it's to be replaced
    just-in-case).

     > # ceph health detail:
     > [WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage
    if this doesn't resolve itself): 3 pgs backfill_toofull
     >     pg 10.46c is active+remapped+backfill_toofull, acting [77,607,96]
     >     pg 10.8ad is active+remapped+backfill_toofull, acting
    [577,152,77]
     >     pg 10.b15 is active+remapped+backfill_toofull, acting
    [483,348,77]

    Our cluster is a little unbalanced and we have 7 OSD nearfull (I think
    it's because we have 4 nodes with 6 TB disks and the other 19 have 10
    TB, but should be unrelated, why is the cluster unbalanced I mean, to
    the backfill_toofull ) not by too much (less then 88%). I'm not too
    much
    worried about it, we will add new storage this month (if the servers
    will arrive) and we will get rid of the old 6 TB servers.

    If I dump the PGs I see, if I'm not mistaken, that the osd.77 will be
    "replaced" by the osd.60, which is one of the nearfull ones (the top
    one
    with 87.53% used).

     > # ceph pg dump:
     >
     > 10.b15     37236                   0         0      37236       
    0  155249620992            0           0   5265      5265 
    active+remapped+backfill_toofull  2023-01-16T14:45:46.155801+0100   
       305742'144106      305742:901513   [483,348,60]         483 
      [483,348,77]             483      305211'144056 
    2023-01-11T10:20:56.600135+0100      305211'144056 
    2023-01-11T10:20:56.600135+0100              0
     > 10.8ad     37518                   0         0      37518       
    0  156345024512            0           0   5517      5517 
    active+remapped+backfill_toofull  2023-01-16T14:45:38.510038+0100   
       305213'142117      305742:937228   [577,60,152]         577 
      [577,152,77]             577      303828'142043 
    2023-01-06T17:52:02.523104+0100      303334'141645 
    2022-12-20T17:39:22.668083+0100              0
     > 10.46c     36710                   0         0      36710       
    0  153023443456            0           0   8172      8172 
    active+remapped+backfill_toofull  2023-01-16T14:45:29.284223+0100   
       305298'141437      305741:877331    [60,607,96]          60   
    [77,607,96]              77      304802'141358 
    2023-01-08T21:39:23.469198+0100      304363'141349 
    2023-01-01T18:13:45.645494+0100              0

     > # ceph osd df:
     >  60    hdd   5.45999   1.00000  5.5 TiB  4.8 TiB   697 GiB  128
    MiB       0 B   697 GiB  87.53  1.29   37      up

    In this situation was the correct way to address the problem?
    reweight-by-utilization the osd.60 to free up space (the OSD is a 6 TB
    disk, and other OSD on the same host are nearfull)? There is other way
    to manually map a PG to a different OSD?

    Thank you for your attention

    Iztok Gregori
    _______________________________________________
    ceph-users mailing list -- ceph-users@xxxxxxx
    <mailto:ceph-users@xxxxxxx>
    To unsubscribe send an email to ceph-users-leave@xxxxxxx
    <mailto:ceph-users-leave@xxxxxxx>

--
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im 
groÃƒ¼en Saal.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx