Re: OSD_FULL after OSD Node Failures

Boris <bb@xxxxxxxxx> · Fri, 27 Dec 2024 12:30:52 +0100

Hi Gerard,

You have very few PGs for very large disks. 50PGs account for 11TB of space. Also full, nearfull and backfillfull is the same value. You want to set them different. 

ceph osd set-nearfull-ratio .85
ceph osd set-backfillfull-ratio .90
ceph osd set-full-ratio .95

I could imagine a lot of PG movement is happening, and during the movement of a PG, the data is doubled until the new OSD can take over. 

What I usually did, was reweighting the full OSDs to .95 just to get things moving. 

When everything is back to normal try to set the amount of PGs higher (x2 or even x4). This also leads to a better distribution. 

Cheers
 Boris

> Am 27.12.2024 um 05:09 schrieb Gerard Hand <g.hand@xxxxxxxxxxxxxxx>:
> 
> Just to add a bit of additional information:
> - I have tried failing the active MGR but that appeared to make no difference.
> - I found restarting the primary OSD on a PG that is reporting backfill_toofull clears to toofull status.  
> 
> It could just be coincidence but since I've started clearing the backfill_toofull status as soon as it appears, I haven't had any OSDs reporting OSD_FULL.  
> 
> Gerard
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx