On 5/1/23 17:35, Frank Schilder wrote:
Hi all, I think we might be hitting a known problem (https://tracker.ceph.com/issues/57244). I don't want to fail the mds yet, because we have troubles with older kclients that miss the mds restart and hold on to cache entries referring to the killed instance, leading to hanging jobs on our HPC cluster.
Will this cause any issue in your case ?
I have seen this issue before and there was a process in D-state that dead-locked itself. Usually, killing this process succeeded and resolved the issue. However, this time I can't find such a process.
BTW, what's the D-state process ? A ceph one ? Thanks
The tracker mentions that one can delete the file/folder. I have the inode number, but really don't want to start a find on a 1.5PB file system. Is there a better way to find what path is causing the issue (ask the MDS directly, look at a cache dump, or similar)? Is there an alternative to deletion or MDS fail? Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx