Re: 14.2.16 Low space hindering backfill after reboot

Marco Pizzolo <marcopizzolo@xxxxxxxxx> · Thu, 28 Jan 2021 18:37:03 -0500

Thanks Eugen,

The issue is pretty much in the rear view now.  It's correcting the last
2.5M misplaced objects.  The OSDs are now evenly balanced at 77% usage, but
we will be adding another 120 OSDs all the same.

Thanks,,
Marco

On Thu, Jan 28, 2021 at 8:16 AM Eugen Block <eblock@xxxxxx> wrote:

> What are your full ratios? The defaults are:
>
>      "mon_osd_backfillfull_ratio": "0.900000",
>      "mon_osd_full_ratio": "0.950000",
>
> You could temporarily increase the mon_osd_backfillfull_ratio a bit
> and see if it resolves. But it's not recommended to get an OSD really
> full, so be careful with that. Do you have the option to add more disks?
>
>
> Zitat von Marco Pizzolo <marcopizzolo@xxxxxxxxx>:
>
> > Hello Everyone,
> >
> > We seem to be having a problem on one of our ceph clusters post the OS
> > patch and reboot of one of the nodes.  The three other nodes are showing
> > OSD fill rates of 77%-81%, but the 60 OSDs contained in the host that was
> > just rebooted are varying between 64% and 90% since the reboot occurred.
> > The three other nodes have not yet been patched or rebooted.
> >
> > The result is:
> >
> >     health: HEALTH_WARN
> >             15 nearfull osd(s)
> >             7 pool(s) nearfull
> >             Low space hindering backfill (add storage if this doesn't
> > resolve itself): 15 pgs backfill_toofull
> >             Degraded data redundancy: 170940/1437684990 objects degraded
> > (0.012%), 4 pgs degraded, 4 pgs undersized
> >
> >   services:
> >     mon: 3 daemons, quorum prdceph01,prdceph02,prdceph03 (age 6h)
> >     mgr: prdceph01(active, since 5w), standbys: prdceph02, prdceph03,
> > prdceph04
> >     mds: ArchiveRepository:1 {0=prdceph01=up:active} 3 up:standby
> >     osd: 240 osds: 240 up (since 6h), 240 in (since 27h); 16 remapped pgs
> >
> >   task status:
> >     scrub status:
> >         mds.prdceph01: idle
> >
> >   data:
> >     pools:   7 pools, 8384 pgs
> >     objects: 479.23M objects, 557 TiB
> >     usage:   1.7 PiB used, 454 TiB / 2.1 PiB avail
> >     pgs:     170940/1437684990 objects degraded (0.012%)
> >              4155186/1437684990 objects misplaced (0.289%)
> >              8332 active+clean
> >              36   active+clean+scrubbing+deep
> >              11   active+remapped+backfill_toofull
> >              2    active+undersized+degraded+remapped+backfill_toofull
> >              2
> >
> >
> active+forced_recovery+undersized+degraded+remapped+forced_backfill+backfill_toofull
> >              1    active+remapped+backfilling
> >
> >   io:
> >     client:   9.6 MiB/s rd, 820 KiB/s wr, 1.02k op/s rd, 189 op/s wr
> >     recovery: 0 B/s, 25 keys/s, 10 objects/s
> >
> > Any suggestions would be greatly appreciated, as currently it is not able
> > to complete the repair, nor will it backfill, even when attempting to
> force.
> >
> > Many thanks in advance.
> >
> > Marco
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx