Re: Huge RAM Ussage on OSD recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 10/22/20 9:02 AM, Ing. Luis Felipe Domínguez Vega wrote:
El 2020-10-22 09:07, Mark Nelson escribió:
On 10/21/20 10:54 PM, Ing. Luis Felipe Domínguez Vega wrote:
El 2020-10-20 17:57, Ing. Luis Felipe Domínguez Vega escribió:
Hi, today mi Infra provider has a blackout, then the Ceph was try to
recover but are in an inconsistent state because many OSD can recover
itself because the kernel kill it by OOM. Even now one OSD that was
OK, go down by OOM killed.

Even in a server with 32GB RAM the OSD use ALL that and never recover,
i think that can be a memory leak, ceph version octopus 15.2.3

In: https://pastebin.pl/view/59089adc
You can see that buffer_anon get 32GB, but why?? all my cluster is
down because that.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
Used the --op export-remove and then --op import of ceph-objectstore-tool for the failing PG and now the OSD is running great.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


That's great news! ...but hopefully we'll figure out what's going on
so we can avoid the problem in the first place. :)


Mark
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

umm not at all, the OSD is not killed but is using huge ammount of RAM and many log meesages like this:

osd.46 osd.46 41072109 : slow request osd_op(client.72068484.0:1851999 5.d 5.1aef4f8d (undecoded) ondisk+write+known_if_redirected e155365) initiated 2020-10-22T11:21:56.949886+0000 currently queued for pg


Do you mean that the --op-export-remove and --op import step didn't end up fixing it in the end?  I had interpreted "running great" to mean the OSD was no longer using tons of memory (but it's not a real fix, just a workaround).


Mark


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux