Hi, This might be the pglog issue which has been coming up a few times on the list. If the OSD cannot boot without going OOM, you might have success by trimming the pglog, e.g. search this list for "ceph-objectstore-tool --op trim-pg-log" for some recipes. The thread "OSDs taking too much memory, for pglog" in particular might help. Cheers, Dan On Tue, Oct 20, 2020 at 11:57 PM Ing. Luis Felipe Domínguez Vega <luis.dominguez@xxxxxxxxx> wrote: > > Hi, today mi Infra provider has a blackout, then the Ceph was try to > recover but are in an inconsistent state because many OSD can recover > itself because the kernel kill it by OOM. Even now one OSD that was OK, > go down by OOM killed. > > Even in a server with 32GB RAM the OSD use ALL that and never recover, i > think that can be a memory leak, ceph version octopus 15.2.3 > > In: https://pastebin.pl/view/59089adc > You can see that buffer_anon get 32GB, but why?? all my cluster is down > because that. > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx