Re: Huge RAM Ussage on OSD recovery

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Wed, 21 Oct 2020 09:34:22 +0200



Hi,

This might be the pglog issue which has been coming up a few times on the list.
If the OSD cannot boot without going OOM, you might have success by
trimming the pglog, e.g. search this list for "ceph-objectstore-tool
--op trim-pg-log" for some recipes. The thread "OSDs taking too much
memory, for pglog" in particular might help.

Cheers, Dan


On Tue, Oct 20, 2020 at 11:57 PM Ing. Luis Felipe Domínguez Vega
<luis.dominguez@xxxxxxxxx> wrote:
>
> Hi, today mi Infra provider has a blackout, then the Ceph was try to
> recover but are in an inconsistent state because many OSD can recover
> itself because the kernel kill it by OOM. Even now one OSD that was OK,
> go down by OOM killed.
>
> Even in a server with 32GB RAM the OSD use ALL that and never recover, i
> think that can be a memory leak, ceph version octopus 15.2.3
>
> In: https://pastebin.pl/view/59089adc
> You can see that buffer_anon get 32GB, but why?? all my cluster is down
> because that.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx