Hi Stefan,
Can you tell if the memory being used is due to the cache not being
trimmed fast enough or something else? You might want to try to see if
you can track down if the 6.5.1 client isn't releasing CAPS properly or
something. Dan Van der Ster might have some insight here as well.
Mark
On 9/19/23 03:57, Stefan Kooman wrote:
Hi List,
For those of you that are brave enough to run 6.5 CephFS kernel client,
we are seeing some interesting things happening. Some of this might be
related to this thread [1]. On a couple of shared webhosting platforms
we are running CephFS with 6.5.1 kernel. We have disabled
"workqueue.cpu_intensive_thresh_us=0" (to prevent CephFS events from
seen as cpu intensive). We have seen two MDS OOM situations after that.
The MDS allocates ~ 60 GiB of RAM above baseline in ~ 50 seconds. In
both OOM situations, a little before the OOM happens, there is a spike
of network traffic going out of the MDS to a kernel client (6.5.1). That
node gets ~ 700 MiB/s of MDS traffic for also ~ 50 seconds before the
MDS process gets killed. Nothing is logged about this. Ceph is
HEALTH_OK, no logging by kernel client or MDS whatsoever. The MDS
rejoins and is up and active after a couple of minutes. There is no
increased load on the MDS or the client that explain this (for as far as
we can see).
At this point I don't expect anyone to tell me based on these symptoms
what the issue is. But if you encounter similar issues, please update
this thread. I'm pretty certain we are hitting a bug (or bugs), as the
MDS should not blow itself up like that in any case (but evict the
client (that misbehaves?).
Ceph MDS 16.2.11, MDS_MEMORY_TARGET=160GiB.
Gr. Stefan
[1]:
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/YR5UNKBOKDHPL2PV4J75ZIUNI4HNMC2W/
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
Best Regards,
Mark Nelson
Head of Research and Development
Clyso GmbH
p: +49 89 21552391 12 | a: Minnesota, USA
w: https://clyso.com | e: mark.nelson@xxxxxxxxx
We are hiring: https://www.clyso.com/jobs/
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx