Hi, Is there some way to inform the kernel client and to reject all the on-going operations when the cluster is destroyed or the CephFS is unreachable? I find that many operations (e.g., df, statfs, access, ...) hang with the following call stack when the CephFS is unreachable: 7752 admin 844 D umount -f -l /share/ceph_vol [~] # cat /proc/7752/stack [<ffffffffa07d02a2>] ceph_mdsc_do_request+0xf2/0x270 [ceph] [<ffffffffa07b47f3>] __ceph_do_getattr+0xa3/0x1b0 [ceph] [<ffffffffa07b4925>] ceph_permission+0x25/0x40 [ceph] [<ffffffff8116960b>] __qnap_inode_permission+0xbb/0x130 [<ffffffff811696d3>] qnap_inode_permission+0x23/0x60 [<ffffffff81169d1f>] link_path_walk+0x23f/0x510 [<ffffffff8116a437>] path_lookupat+0x77/0x100 [<ffffffff8116a555>] filename_lookup+0x95/0x150 [<ffffffff8116a6b5>] user_path_at_empty+0x35/0x40 [<ffffffff81163037>] SyS_readlink+0x47/0xf0 [<ffffffff81a83f57>] entry_SYSCALL_64_fastpath+0x12/0x6a [<ffffffffffffffff>] 0xffffffffffffffff 3520 admin 992 S grep reboot [~] # cat /proc/23739/stack [<ffffffffa07d0fab>] ceph_mdsc_sync+0x46b/0x690 [ceph] [<ffffffffa07af40a>] ceph_sync_fs+0x5a/0xc0 [ceph] [<ffffffff8118e30b>] sync_fs_one_sb+0x1b/0x20 [<ffffffff81161658>] iterate_supers+0xa8/0x100 [<ffffffff8118e410>] sys_sync+0x50/0x90 [<ffffffff81a83f57>] entry_SYSCALL_64_fastpath+0x12/0x6a [<ffffffffffffffff>] 0xffffffffffffffff 8150 admin 1000 D /bin/df -k /share/ceph_vol [~] # cat /proc/8150/stack [<ffffffffa07d033c>] ceph_mdsc_do_request+0x18c/0x260 [ceph] [<ffffffffa07b47f3>] __ceph_do_getattr+0xa3/0x1b0 [ceph] [<ffffffffa07b4963>] ceph_getattr+0x23/0xf0 [ceph] [<ffffffff811626d7>] vfs_getattr_nosec+0x27/0x40 [<ffffffff81162830>] vfs_fstatat+0x60/0xa0 [<ffffffff81162c8f>] SYSC_newstat+0x1f/0x40 [<ffffffff81162eb9>] SyS_newstat+0x9/0x10 [<ffffffff81a83f57>] entry_SYSCALL_64_fastpath+0x12/0x6a [<ffffffffffffffff>] 0xffffffffffffffff The Ceph verion is "version v11.0.2-1-g5b7012b" and the kernel verion is "linux-4.2.8". Before sending this e-mail, I find a related patch (48fec5d, ceph: EIO all operations after forced umount) and it does solve some problem in my environment. But, sometimes, even the forced umount gets stuck forever. After looking into the codes, I find that the req->r_timeout is only used during mount operation (60 sec.) and for other operations, it will wait forever even when the remote cluster is dead. Is there some consideration that the req->r_timeout is left as zero? Any ideas will be appreciated, thanks. - Jerry -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html