Commands on cephfs mounts getting stuck in uninterruptible sleep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I am seeing some commands running on CephFS mounts getting stuck in an uninterruptible sleep, at which point I can only terminate them by rebooting the client. Has anyone experienced anything similar and found a way to safe-guard against this?

My mount is using the ceph kernel driver, with the following config in fstab: 10.225.44.236,10.225.44.237,10.225.44.238:6789:/albacore/system/deploy on /opt/dcl/deploy type ceph (rw,noatime,name=albacore,secret=<hidden>,acl,wsize=32768,rsize=32768,_netdev)

The vast majority of commands complete successfully on the mounted filesystem but on one occasion a "chmod -R +r *" command hung indefinitely (despite having run successfully numerous times before). Attempts to terminate the process using `kill` fail. Repeated attempts to run the same command also get blocked in the same state. A `ps` command shows the processes are stuck in uninterruptable sleep:

[root@svr01 albacore] ~> ps -Al | grep chmod
4 D     0 18657 18656  0  80   0 - 26998 rwsem_ pts/2    00:00:00 chmod
4 D     0 21835     1  0  80   0 - 26998 rwsem_ ?        00:00:00 chmod

Ceph seems to be unaware of the hung process. There are no slow ops / ops in flight in either of the dump_ops_in_flight output on the server, or under sys/kernel/debug/ceph/ on the client. Similarly there are no logs in dmesg for the command / process. Ceph health reports no MDS issues, and there's nothing in the logs from my MDS from when the processes hung.

The only method I've found of clearing the processes is to reboot my client.

Has anyone got experience with this? Are there ceph mount options that would guard against this? 


Some details of the current setup:
• ceph version 14.2.5 (ad5bd132e1492173c85fda2cc863152730b16a92) nautilus (stable)
• We're using the ceph kernel driver, kernel: 5.5.7-1.el7.elrepo.x86_64
• The client server has 38 separate directories mounted, all from the same CephFS filesystem. 
• All 38 directories are mounted with the same config by three separate clients.
• Mount config (in fstab): 10.225.44.236,10.225.44.237,10.225.44.238:6789:/albacore/system/deploy on /opt/dcl/deploy type ceph (rw,noatime,name=albacore,secret=<hidden>,acl,wsize=32768,rsize=32768,_netdev)



Kind regards,

Dave
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux