Re: Huge RAM Ussage on OSD recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



El 2020-10-21 10:08, Mark Nelson escribió:
On 10/21/20 7:54 AM, Ing. Luis Felipe Domínguez Vega wrote:
El 2020-10-21 08:43, Mark Nelson escribió:
Theoretically we shouldn't be spiking memory as much these days during
recovery, but the code is complicated and it's tough to reproduce
these kinds of issues in-house.  If you happen to catch it in the act,
do you see the pglog mempool stats also spiking up?


Mark


On 10/21/20 2:34 AM, Dan van der Ster wrote:
Hi,

This might be the pglog issue which has been coming up a few times on the list.
If the OSD cannot boot without going OOM, you might have success by
trimming the pglog, e.g. search this list for "ceph-objectstore-tool
--op trim-pg-log" for some recipes. The thread "OSDs taking too much
memory, for pglog" in particular might help.

Cheers, Dan



On Tue, Oct 20, 2020 at 11:57 PM Ing. Luis Felipe Domínguez Vega
<luis.dominguez@xxxxxxxxx> wrote:
Hi, today mi Infra provider has a blackout, then the Ceph was try to recover but are in an inconsistent state because many OSD can recover itself because the kernel kill it by OOM. Even now one OSD that was OK,
go down by OOM killed.

Even in a server with 32GB RAM the OSD use ALL that and never recover, i
think that can be a memory leak, ceph version octopus 15.2.3

In: https://pastebin.pl/view/59089adc
You can see that buffer_anon get 32GB, but why?? all my cluster is down
because that.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
this https://pastebin.pl/view/59089adc is almost the OSD going to be killed by OOM


Ok, that is very interesting!  The OSD memory autotuning code shrank
the caches to be almost nothing to try and compensate for the huge
growth in buffer_anon (and to a lesser extent osd_pglog) usage but
obviously couldn't do anything with that much memory being used.  Any
chance you could create a tracker ticket and paste the memory pool
info in along with ceph version/etc?


https://tracker.ceph.com/


Mark

Thanks, https://tracker.ceph.com/issues/47929
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux