Re: Ceph MDS OOM in combination with 6.5.1 kernel client

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Stefan,

Can you tell if the memory being used is due to the cache not being trimmed fast enough or something else? You might want to try to see if you can track down if the 6.5.1 client isn't releasing CAPS properly or something. Dan Van der Ster might have some insight here as well.

Mark

On 9/19/23 03:57, Stefan Kooman wrote:
Hi List,

For those of you that are brave enough to run 6.5 CephFS kernel client, we are seeing some interesting things happening. Some of this might be related to this thread [1]. On a couple of shared webhosting platforms we are running CephFS with 6.5.1 kernel. We have disabled "workqueue.cpu_intensive_thresh_us=0" (to prevent CephFS events from seen as cpu intensive). We have seen two MDS OOM situations after that. The MDS allocates ~ 60 GiB of RAM above baseline in ~ 50 seconds. In both OOM situations, a little before the OOM happens, there is a spike of network traffic going out of the MDS to a kernel client (6.5.1). That node gets ~ 700 MiB/s of MDS traffic for also ~ 50 seconds before the MDS process gets killed. Nothing is logged about this. Ceph is HEALTH_OK, no logging by kernel client or MDS whatsoever. The MDS rejoins and is up and active after a couple of minutes. There is no increased load on the MDS or the client that explain this (for as far as we can see).

At this point I don't expect anyone to tell me based on these symptoms what the issue is. But if you encounter similar issues, please update this thread. I'm pretty certain we are hitting a bug (or bugs), as the MDS should not blow itself up like that in any case (but evict the client (that misbehaves?).

Ceph MDS 16.2.11, MDS_MEMORY_TARGET=160GiB.

Gr. Stefan

[1]: https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/YR5UNKBOKDHPL2PV4J75ZIUNI4HNMC2W/
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Best Regards,
Mark Nelson
Head of Research and Development

Clyso GmbH
p: +49 89 21552391 12 | a: Minnesota, USA
w: https://clyso.com | e: mark.nelson@xxxxxxxxx

We are hiring: https://www.clyso.com/jobs/
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux