回复: cephfs, kernel(4.12.8) client version hung(D status), ceph version 0.94.9

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



ZhengYan,

host1 client,  ls hung too, stack is:
cat /proc/863/stack
[<ffffffffa05d5b53>] ceph_mdsc_do_request+0x183/0x240 [ceph]
[<ffffffffa05b5e3d>] __ceph_do_getattr+0xcd/0x1d0 [ceph]
[<ffffffffa05b5fbc>] ceph_getattr+0x2c/0x100 [ceph]
[<ffffffff812454fc>] vfs_getattr_nosec+0x9c/0xf0
[<ffffffff81245586>] vfs_getattr+0x36/0x40
[<ffffffff812456ae>] vfs_statx+0x8e/0xe0
[<ffffffff81245c6d>] SYSC_newlstat+0x3d/0x70
[<ffffffff812464fe>] SyS_newlstat+0xe/0x10
[<ffffffff81003a07>] do_syscall_64+0x67/0x150
[<ffffffff817b1427>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff

Thanks a lot.


donglifecomm@xxxxxxxxx
 
发件人: donglifecomm@xxxxxxxxx
发送时间: 2017-08-24 17:40
收件人: zyan
抄送: ceph-users
主题: cephfs, kernel(4.12.8) client version hung(D status), ceph version 0.94.9
ZhengYan,

I meet a problem,   Follow the steps outlined below:

1.  create 30G file test823
2.  host1 client(kernel 4.12.8)
      cat /mnt/cephfs/a/test823 > /mnt/cephfs/a/test823-backup
      ls -al /mnt/cephfs/a/* 

3. host2 client(kernel 4.12.8)
      while true; do cp /home/scripts/512k.file /mnt/cephfs/a/512k.file$i ; done // loop copy file
      cat /mnt/cephfs/a/test823-backup > /mnt/cephfs/a/newtestfile
      ls -al /mnt/cephfs/a/*
  
4. host2 client hung, stack is :
[ 9462.754853] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 9462.756838] bash            D    0 32738  14988 0x00000084
[ 9462.758568] Call Trace:
[ 9462.759945]  __schedule+0x28a/0x880
[ 9462.761414]  schedule+0x36/0x80
[ 9462.762835]  rwsem_down_write_failed+0x20d/0x380
[ 9462.764433]  call_rwsem_down_write_failed+0x17/0x30
[ 9462.766075]  ? __ceph_getxattr+0x340/0x340 [ceph]
[ 9462.767693]  down_write+0x2d/0x40
[ 9462.769175]  do_truncate+0x67/0xc0
[ 9462.770642]  path_openat+0xaba/0x13b0
[ 9462.772136]  do_filp_open+0x91/0x100
[ 9462.773616]  ? __check_object_size+0x159/0x190
[ 9462.775156]  ? __alloc_fd+0x46/0x170
[ 9462.776574]  do_sys_open+0x124/0x210
[ 9462.777972]  SyS_open+0x1e/0x20
[ 9462.779320]  do_syscall_64+0x67/0x150
[ 9462.780736]  entry_SYSCALL64_slow_path+0x25/0x25

[root@cephtest ~]# cat /proc/29541/stack
[<ffffffffa0567b53>] ceph_mdsc_do_request+0x183/0x240 [ceph]
[<ffffffffa054785c>] __ceph_setattr+0x3fc/0x8b0 [ceph]
[<ffffffffa0547d4c>] ceph_setattr+0x3c/0x60 [ceph]
[<ffffffff812623b6>] notify_change+0x266/0x440
[<ffffffff8123cd85>] do_truncate+0x75/0xc0
[<ffffffff8124f7aa>] path_openat+0xaba/0x13b0
[<ffffffff81251c81>] do_filp_open+0x91/0x100
[<ffffffff8123e304>] do_sys_open+0x124/0x210
[<ffffffff8123e40e>] SyS_open+0x1e/0x20
[<ffffffff81003a07>] do_syscall_64+0x67/0x150
[<ffffffff817b1427>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff

[root@cephtest ~]# cat /proc/32738/stack
[<ffffffff8139a617>] call_rwsem_down_write_failed+0x17/0x30
[<ffffffff8123cd77>] do_truncate+0x67/0xc0
[<ffffffff8124f7aa>] path_openat+0xaba/0x13b0
[<ffffffff81251c81>] do_filp_open+0x91/0x100
[<ffffffff8123e304>] do_sys_open+0x124/0x210
[<ffffffff8123e40e>] SyS_open+0x1e/0x20
[<ffffffff81003a07>] do_syscall_64+0x67/0x150
[<ffffffff817b1427>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff

ceph log is:
f pending pAsLsXs issued pAsLsXsFcb, sent 1921.069365 seconds ago
2017-08-24 17:16:00.219523 7f746db8f700  0 log_channel(cluster) log [WRN] : client.268113 isn't responding to mclientcaps(revoke), ino 10000000424 pending pAsLsXs issued pAsLsXsFcb, sent 1921.063079 seconds ago
2017-08-24 17:16:00.219534 7f746db8f700  0 log_channel(cluster) log [WRN] : client.268113 isn't responding to mclientcaps(revoke), ino 10000000521 pending pAsLsXs issued pAsLsXsFcb, sent 1921.026983 seconds ago
2017-08-24 17:16:00.219545 7f746db8f700  0 log_channel(cluster) log [WRN] : client.268113 isn't responding to mclientcaps(revoke), ino 10000000523 pending pAsLsXs issued pAsLsXsFcb, sent 1920.985596 seconds ago
2017-08-24 17:16:00.219574 7f746db8f700  0 log_channel(cluster) log [WRN] : client.268113 isn't responding to mclientcaps(revoke), ino 10000000528 pending pAsLsXs issued pAsLsXsFcb, sent 1920.866863 seconds ago
2017-08-24 17:16:00.219592 7f746db8f700  0 log_channel(cluster) log [WRN] : client.268113 isn't responding to mclientcaps(revoke), ino 1000000052a pending pAsLsXs issued pAsLsXsFcb, sent 1920.788282 seconds ago
2017-08-24 17:16:00.219606 7f746db8f700  0 log_channel(cluster) log [WRN] : client.268113 isn't responding to mclientcaps(revoke), ino 1000000052c pending pAsLsXs issued pAsLsXsFcb, sent 1920.712564 seconds ago
2017-08-24 17:16:00.219618 7f746db8f700  0 log_channel(cluster) log [WRN] : client.268113 isn't responding to mclientcaps(revoke), ino 1000000052f pending pAsLsXs issued pAsLsXsFcb, sent 1920.563784 seconds ago
2017-08-24 17:16:00.219630 7f746db8f700  0 log_channel(cluster) log [WRN] : client.268113 isn't responding to mclientcaps(revoke), ino 1000000040b pending pAsLsXsFsc issued pAsLsXsFscb, sent 1920.506752 seconds ago
2017-08-24 17:16:00.219741 7f746db8f700  0 log_channel(cluster) log [WRN] : 4 slow requests, 1 included below; oldest blocked for > 1941.487238 secs
2017-08-24 17:16:00.219753 7f746db8f700  0 log_channel(cluster) log [WRN] : slow request 1920.507384 seconds old, received at 2017-08-24 16:43:59.712319: client_request(client.268101:1122217 getattr pAsLsXsFs #1000000040b 2017-08-24 16:44:00.463827) currently failed to rdlock, waiting

Thanks a lot.

     




donglifecomm@xxxxxxxxx
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux