Hi Adrien, On Mon, Jul 22, 2024 at 5:17 AM Adrien Georget <adrien.georget@xxxxxxxxxxx> wrote: > > Hi, > > For the last 2 months, our MDS is frequently switching to another > because of a sudden memory leak. > The host has 128G RAM and most of the time the MDS occupies ~20% of > memory. And in less than 3 minutes it increases to 100% and crashs with > tcmalloc: allocation failed. > > We tried to run heap stats / perf dump on the host but we couldn't find > any reasons why the memory used by the MDS exploses so quickly. > MDS log available here : > https://filesender.renater.fr/?s=download&token=c1e60c3c-7f02-4f1e-b23e-f5b25c0cd2a8 > > > Any idea what could lead to this memory leak? Anything we can try to > understand what happens or prevent this? > We use Pacific 16.2.14. It is probably an instance of this: https://tracker.ceph.com/issues/66704 Check backports of an MDS fix here: https://tracker.ceph.com/issues/64977 If you can, using an older kernel or wait until the release is available with the backported fix. -- Patrick Donnelly, Ph.D. He / Him / His Red Hat Partner Engineer IBM, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx