Re: client isn't responding to mclientcaps(revoke), pending pAsLsXsFsc issued pAsLsXsFsc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/05/2023 11:35, Frank Schilder wrote:
Hi all,

I think we might be hitting a known problem (https://tracker.ceph.com/issues/57244). I don't want to fail the mds yet, because we have troubles with older kclients that miss the mds restart and hold on to cache entries referring to the killed instance, leading to hanging jobs on our HPC cluster.

I have seen this issue before and there was a process in D-state that dead-locked itself. Usually, killing this process succeeded and resolved the issue. However, this time I can't find such a process.

The tracker mentions that one can delete the file/folder. I have the inode number, but really don't want to start a find on a 1.5PB file system. Is there a better way to find what path is causing the issue (ask the MDS directly, look at a cache dump, or similar)? Is there an alternative to deletion or MDS fail?

Hello,
If you have the inode number, you can retrieve the name with something like:
 rados getxattr -p $POOL ${ino}.00000000 parent | \
  ceph-dencoder type inode_backtrace_t import - decode dump_json | \
  jq -M '[.ancestors[].dname]' | tr -d '[[",\]]' | \
  awk 't!=""{t=$1 "/" t;}t==""{t=$1;}END{print t}'

Where $POOL is the "default pool" name (for files) or the metadata pool name (for directories) and $ino is the inode number (in hexadecimal).


Loïc.
--
|   Loīc Tortay <tortay@xxxxxxxxxxx>  -     IN2P3 Computing Centre     |
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux