PG_BACKFILL_FULL

Iztok Gregori <iztok.gregori@xxxxxxxxxx> · Mon, 16 Jan 2023 15:00:25 +0100

Hi to all!

We are in a situation where we have 3 PG in 
"active+remapped+backfill_toofull". It happened when we executed a 
"gentle-reweight" to zero of one OSD (osd.77) to swap it with a new one 
(the current one registered some read errors and it's to be replaced 
just-in-case).

# ceph health detail:
[WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if this doesn't resolve itself): 3 pgs backfill_toofull
    pg 10.46c is active+remapped+backfill_toofull, acting [77,607,96]
    pg 10.8ad is active+remapped+backfill_toofull, acting [577,152,77]
    pg 10.b15 is active+remapped+backfill_toofull, acting [483,348,77]

Our cluster is a little unbalanced and we have 7 OSD nearfull (I think 
it's because we have 4 nodes with 6 TB disks and the other 19 have 10 
TB, but should be unrelated, why is the cluster unbalanced I mean, to 
the backfill_toofull ) not by too much (less then 88%). I'm not too much 
worried about it, we will add new storage this month (if the servers 
will arrive) and we will get rid of the old 6 TB servers.

If I dump the PGs I see, if I'm not mistaken, that the osd.77 will be 
"replaced" by the osd.60, which is one of the nearfull ones (the top one 
with 87.53% used).

# ceph pg dump:

10.b15     37236                   0         0      37236        0  155249620992            0           0   5265      5265  active+remapped+backfill_toofull  2023-01-16T14:45:46.155801+0100      305742'144106      305742:901513   [483,348,60]         483   [483,348,77]             483      305211'144056  2023-01-11T10:20:56.600135+0100      305211'144056  2023-01-11T10:20:56.600135+0100              0
10.8ad     37518                   0         0      37518        0  156345024512            0           0   5517      5517  active+remapped+backfill_toofull  2023-01-16T14:45:38.510038+0100      305213'142117      305742:937228   [577,60,152]         577   [577,152,77]             577      303828'142043  2023-01-06T17:52:02.523104+0100      303334'141645  2022-12-20T17:39:22.668083+0100              0
10.46c     36710                   0         0      36710        0  153023443456            0           0   8172      8172  active+remapped+backfill_toofull  2023-01-16T14:45:29.284223+0100      305298'141437      305741:877331    [60,607,96]          60    [77,607,96]              77      304802'141358  2023-01-08T21:39:23.469198+0100      304363'141349  2023-01-01T18:13:45.645494+0100              0

# ceph osd df:
 60    hdd   5.45999   1.00000  5.5 TiB  4.8 TiB   697 GiB  128 MiB       0 B   697 GiB  87.53  1.29   37      up

In this situation was the correct way to address the problem? 
reweight-by-utilization the osd.60 to free up space (the OSD is a 6 TB 
disk, and other OSD on the same host are nearfull)? There is other way 
to manually map a PG to a different OSD?

Thank you for your attention

Iztok Gregori
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx