Re: Help needed, ceph fs down due to large stray dir

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



+1 to this, and the doc mentioned.

Just be aware depending on version the heartbeat grace parameter is different, I believe for 16 and below it's the one I mentioned, and it's to be set on the mon level, and for 17 and newer it is what Spencer mentioned. The doc he has provided also mentions such, and some other helpful configs.

Again, if you can access the directories of the mds rank in question when it's active, see if you can stat some of them.

Best of luck friend,

Regards,

Bailey Allison
Service Team Lead
45Drives, Ltd.
866-594-7199 x868

On 1/10/25 18:07, Spencer Macphee wrote:
You could try some of the steps here Frank:
https://docs.ceph.com/en/quincy/cephfs/troubleshooting/#avoiding-recovery-roadblocks

mds_heartbeat_reset_grace is probably the only one really relevant to your
scenario.

On Fri, Jan 10, 2025 at 1:30 PM Frank Schilder <frans@xxxxxx> wrote:

Hi all,

we seem to have a serious issue with our file system, ceph version is
pacific latest. After a large cleanup operation we had an MDS rank with
100Mio stray entries (yes, one hundred million). Today we restarted this
daemon, which cleans up the stray entries. It seems that this leads to a
restart loop due to OOM. The rank becomes active and then starts pulling in
DNS and INOS entries until all memory is exhausted.

I have no idea if there is at least progress removing the stray items or
if it starts from scratch every time. If it needs to pull as many DNS/INOS
into cache as there are stray items, we don't have a server at hand with
enough RAM.

Q1: Is the MDS at least making progress in every restart iteration?
Q2: If not, how do we get this rank up again?
Q3: If we can't get this rank up soon, can we at least move directories
away from this rank by pinning it to another rank?

Currently, the rank in question reports .mds_cache.num_strays=0 in perf
dump.

=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux