El 2020-10-21 10:08, Mark Nelson escribió:
On 10/21/20 7:54 AM, Ing. Luis Felipe Domínguez Vega wrote:
El 2020-10-21 08:43, Mark Nelson escribió:
Theoretically we shouldn't be spiking memory as much these days
during
recovery, but the code is complicated and it's tough to reproduce
these kinds of issues in-house. If you happen to catch it in the
act,
do you see the pglog mempool stats also spiking up?
Mark
On 10/21/20 2:34 AM, Dan van der Ster wrote:
Hi,
This might be the pglog issue which has been coming up a few times
on the list.
If the OSD cannot boot without going OOM, you might have success by
trimming the pglog, e.g. search this list for "ceph-objectstore-tool
--op trim-pg-log" for some recipes. The thread "OSDs taking too much
memory, for pglog" in particular might help.
Cheers, Dan
On Tue, Oct 20, 2020 at 11:57 PM Ing. Luis Felipe Domínguez Vega
<luis.dominguez@xxxxxxxxx> wrote:
Hi, today mi Infra provider has a blackout, then the Ceph was try
to
recover but are in an inconsistent state because many OSD can
recover
itself because the kernel kill it by OOM. Even now one OSD that was
OK,
go down by OOM killed.
Even in a server with 32GB RAM the OSD use ALL that and never
recover, i
think that can be a memory leak, ceph version octopus 15.2.3
In: https://pastebin.pl/view/59089adc
You can see that buffer_anon get 32GB, but why?? all my cluster is
down
because that.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
this https://pastebin.pl/view/59089adc is almost the OSD going to be
killed by OOM
Ok, that is very interesting! The OSD memory autotuning code shrank
the caches to be almost nothing to try and compensate for the huge
growth in buffer_anon (and to a lesser extent osd_pglog) usage but
obviously couldn't do anything with that much memory being used. Any
chance you could create a tracker ticket and paste the memory pool
info in along with ceph version/etc?
https://tracker.ceph.com/
Mark
Thanks, https://tracker.ceph.com/issues/47929
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx