cephfs kernel client umount stucks forever

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Recently I encountered a issue that cephfs kernel client umount stucks
forever. Under such condition, the call stack of umount process is
shown as below and it seems to be reasonable:

[~] # cat /proc/985427/stack
[<ffffffff81098bcd>] io_schedule+0xd/0x30
[<ffffffff8111ab6f>] wait_on_page_bit_common+0xdf/0x160
[<ffffffff8111b0ec>] __filemap_fdatawait_range+0xec/0x140
[<ffffffff8111b195>] filemap_fdatawait_keep_errors+0x15/0x40
[<ffffffff811ab5a9>] sync_inodes_sb+0x1e9/0x220
[<ffffffff811b15be>] sync_filesystem+0x4e/0x80
[<ffffffff8118203d>] generic_shutdown_super+0x1d/0x110
[<ffffffffa08a48cc>] ceph_kill_sb+0x2c/0x80 [ceph]
[<ffffffff81181ca4>] deactivate_locked_super+0x34/0x60
[<ffffffff811a2f56>] cleanup_mnt+0x36/0x70
[<ffffffff8108e86f>] task_work_run+0x6f/0x90
[<ffffffff81001a9b>] do_syscall_64+0x27b/0x2c0
[<ffffffff81a00071>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<ffffffffffffffff>] 0xffffffffffffffff

>From the debugfs entry, two write requests are indeed not complete but
I can't figure it out.
[/sys/kernel/debug/ceph/63be7de3-e137-4b6d-ab75-323b27f21254.client4475]
# cat osdc
REQUESTS 2 homeless 0
36      osd13   1.d069c5d       1.1d    [13,4,0]/13     [13,4,0]/13
 e327    10000000028.00000000    0x40002c        2       write
37      osd13   1.8088c98       1.18    [13,6,0]/13     [13,6,0]/13
 e327    10000000029.00000000    0x40002c        2       write
LINGER REQUESTS
BACKOFFS

The kernel version is 4.14 with some customized features and the
cluster is composed by 3 nodes.  On those nodes, CephFS is mount via
kernel client and the issue only happens on one node while others
umount the CephFS successfully.  I've already checked the upstream
patches and no related issues are found.  Currently, I try to
re-produce the issue in an environment with bad network quality
(emulated by tc, add some packet loss, corruption and latency to the
network between client and server).  Also, osdmap is tuned much more
frequently to trigger request resent on the client.  But, I got no
luck with above approach.

Is there any suggestion or idea that I could do to further investigate
the issue?  Thanks!

- Jerry



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux