> On 24 Aug 2017, at 17:40, donglifecomm@xxxxxxxxx wrote: > > ZhengYan, > > I meet a problem, Follow the steps outlined below: > > 1. create 30G file test823 > 2. host1 client(kernel 4.12.8) > cat /mnt/cephfs/a/test823 > /mnt/cephfs/a/test823-backup > ls -al /mnt/cephfs/a/* > > 3. host2 client(kernel 4.12.8) > while true; do cp /home/scripts/512k.file /mnt/cephfs/a/512k.file$i ; done // loop copy file > cat /mnt/cephfs/a/test823-backup > /mnt/cephfs/a/newtestfile > ls -al /mnt/cephfs/a/* > > 4. host2 client hung, stack is : > [ 9462.754853] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 9462.756838] bash D 0 32738 14988 0x00000084 > [ 9462.758568] Call Trace: > [ 9462.759945] __schedule+0x28a/0x880 > [ 9462.761414] schedule+0x36/0x80 > [ 9462.762835] rwsem_down_write_failed+0x20d/0x380 > [ 9462.764433] call_rwsem_down_write_failed+0x17/0x30 > [ 9462.766075] ? __ceph_getxattr+0x340/0x340 [ceph] > [ 9462.767693] down_write+0x2d/0x40 > [ 9462.769175] do_truncate+0x67/0xc0 > [ 9462.770642] path_openat+0xaba/0x13b0 > [ 9462.772136] do_filp_open+0x91/0x100 > [ 9462.773616] ? __check_object_size+0x159/0x190 > [ 9462.775156] ? __alloc_fd+0x46/0x170 > [ 9462.776574] do_sys_open+0x124/0x210 > [ 9462.777972] SyS_open+0x1e/0x20 > [ 9462.779320] do_syscall_64+0x67/0x150 > [ 9462.780736] entry_SYSCALL64_slow_path+0x25/0x25 > > [root@cephtest ~]# cat /proc/29541/stack > [<ffffffffa0567b53>] ceph_mdsc_do_request+0x183/0x240 [ceph] > [<ffffffffa054785c>] __ceph_setattr+0x3fc/0x8b0 [ceph] > [<ffffffffa0547d4c>] ceph_setattr+0x3c/0x60 [ceph] > [<ffffffff812623b6>] notify_change+0x266/0x440 > [<ffffffff8123cd85>] do_truncate+0x75/0xc0 > [<ffffffff8124f7aa>] path_openat+0xaba/0x13b0 > [<ffffffff81251c81>] do_filp_open+0x91/0x100 > [<ffffffff8123e304>] do_sys_open+0x124/0x210 > [<ffffffff8123e40e>] SyS_open+0x1e/0x20 > [<ffffffff81003a07>] do_syscall_64+0x67/0x150 > [<ffffffff817b1427>] entry_SYSCALL64_slow_path+0x25/0x25 > [<ffffffffffffffff>] 0xffffffffffffffff > > [root@cephtest ~]# cat /proc/32738/stack > [<ffffffff8139a617>] call_rwsem_down_write_failed+0x17/0x30 > [<ffffffff8123cd77>] do_truncate+0x67/0xc0 > [<ffffffff8124f7aa>] path_openat+0xaba/0x13b0 > [<ffffffff81251c81>] do_filp_open+0x91/0x100 > [<ffffffff8123e304>] do_sys_open+0x124/0x210 > [<ffffffff8123e40e>] SyS_open+0x1e/0x20 > [<ffffffff81003a07>] do_syscall_64+0x67/0x150 > [<ffffffff817b1427>] entry_SYSCALL64_slow_path+0x25/0x25 > [<ffffffffffffffff>] 0xffffffffffffffff > > ceph log is: > f pending pAsLsXs issued pAsLsXsFcb, sent 1921.069365 seconds ago > 2017-08-24 17:16:00.219523 7f746db8f700 0 log_channel(cluster) log [WRN] : client.268113 isn't responding to mclientcaps(revoke), ino 10000000424 pending pAsLsXs issued pAsLsXsFcb, sent 1921.063079 seconds ago > 2017-08-24 17:16:00.219534 7f746db8f700 0 log_channel(cluster) log [WRN] : client.268113 isn't responding to mclientcaps(revoke), ino 10000000521 pending pAsLsXs issued pAsLsXsFcb, sent 1921.026983 seconds ago > 2017-08-24 17:16:00.219545 7f746db8f700 0 log_channel(cluster) log [WRN] : client.268113 isn't responding to mclientcaps(revoke), ino 10000000523 pending pAsLsXs issued pAsLsXsFcb, sent 1920.985596 seconds ago > 2017-08-24 17:16:00.219574 7f746db8f700 0 log_channel(cluster) log [WRN] : client.268113 isn't responding to mclientcaps(revoke), ino 10000000528 pending pAsLsXs issued pAsLsXsFcb, sent 1920.866863 seconds ago > 2017-08-24 17:16:00.219592 7f746db8f700 0 log_channel(cluster) log [WRN] : client.268113 isn't responding to mclientcaps(revoke), ino 1000000052a pending pAsLsXs issued pAsLsXsFcb, sent 1920.788282 seconds ago > 2017-08-24 17:16:00.219606 7f746db8f700 0 log_channel(cluster) log [WRN] : client.268113 isn't responding to mclientcaps(revoke), ino 1000000052c pending pAsLsXs issued pAsLsXsFcb, sent 1920.712564 seconds ago > 2017-08-24 17:16:00.219618 7f746db8f700 0 log_channel(cluster) log [WRN] : client.268113 isn't responding to mclientcaps(revoke), ino 1000000052f pending pAsLsXs issued pAsLsXsFcb, sent 1920.563784 seconds ago > 2017-08-24 17:16:00.219630 7f746db8f700 0 log_channel(cluster) log [WRN] : client.268113 isn't responding to mclientcaps(revoke), ino 1000000040b pending pAsLsXsFsc issued pAsLsXsFscb, sent 1920.506752 seconds ago > 2017-08-24 17:16:00.219741 7f746db8f700 0 log_channel(cluster) log [WRN] : 4 slow requests, 1 included below; oldest blocked for > 1941.487238 secs > 2017-08-24 17:16:00.219753 7f746db8f700 0 log_channel(cluster) log [WRN] : slow request 1920.507384 seconds old, received at 2017-08-24 16:43:59.712319: client_request(client.268101:1122217 getattr pAsLsXsFs #1000000040b 2017-08-24 16:44:00.463827) currently failed to rdlock, waiting please check if there are hung request in /sys/kernel/debug/ceph/*/osdc. It’s likely that kernel was unable to flush dirty pages. Regards Yan, Zheng > > Thanks a lot. > > > > > > donglifecomm@xxxxxxxxx _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com