hanging slow requests: failed to authpin, subtree is being exported

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

When syncing data with rsync, I'm often getting blocked slow requests, which also block access to this path.

2019-09-23 11:25:49.477 7f4f401e8700  0 log_channel(cluster) log [WRN] : slow request 31.895478 seconds old, received at 2019-09-23 11:25:17.598152: client_request(client.38352684:92684 lookup #0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0, caller_gid=0{0,}) currently failed to authpin, subtree is being exported 2019-09-23 11:26:19.477 7f4f401e8700  0 log_channel(cluster) log [WRN] : slow request 61.896079 seconds old, received at 2019-09-23 11:25:17.598152: client_request(client.38352684:92684 lookup #0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0, caller_gid=0{0,}) currently failed to authpin, subtree is being exported 2019-09-23 11:27:19.478 7f4f401e8700  0 log_channel(cluster) log [WRN] : slow request 121.897268 seconds old, received at 2019-09-23 11:25:17.598152: client_request(client.38352684:92684 lookup #0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0, caller_gid=0{0,}) currently failed to authpin, subtree is being exported 2019-09-23 11:29:19.488 7f4f401e8700  0 log_channel(cluster) log [WRN] : slow request 241.899467 seconds old, received at 2019-09-23 11:25:17.598152: client_request(client.38352684:92684 lookup #0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0, caller_gid=0{0,}) currently failed to authpin, subtree is being exported 2019-09-23 11:33:19.680 7f4f401e8700  0 log_channel(cluster) log [WRN] : slow request 482.087927 seconds old, received at 2019-09-23 11:25:17.598152: client_request(client.38352684:92684 lookup #0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0, caller_gid=0{0,}) currently failed to authpin, subtree is being exported 2019-09-23 11:36:09.881 7f4f401e8700  0 log_channel(cluster) log [WRN] : slow request 32.677511 seconds old, received at 2019-09-23 11:35:37.217113: client_request(client.38347357:111963 lookup #0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0, caller_gid=0{0,}) currently failed to authpin, subtree is being exported 2019-09-23 11:36:39.881 7f4f401e8700  0 log_channel(cluster) log [WRN] : slow request 62.678132 seconds old, received at 2019-09-23 11:35:37.217113: client_request(client.38347357:111963 lookup #0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0, caller_gid=0{0,}) currently failed to authpin, subtree is being exported 2019-09-23 11:37:39.891 7f4f401e8700  0 log_channel(cluster) log [WRN] : slow request 122.679273 seconds old, received at 2019-09-23 11:35:37.217113: client_request(client.38347357:111963 lookup #0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0, caller_gid=0{0,}) currently failed to authpin, subtree is being exported 2019-09-23 11:39:39.892 7f4f401e8700  0 log_channel(cluster) log [WRN] : slow request 242.684667 seconds old, received at 2019-09-23 11:35:37.217113: client_request(client.38347357:111963 lookup #0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0, caller_gid=0{0,}) currently failed to authpin, subtree is being exported 2019-09-23 11:41:19.893 7f4f401e8700  0 log_channel(cluster) log [WRN] : slow request 962.305681 seconds old, received at 2019-09-23 11:25:17.598152: client_request(client.38352684:92684 lookup #0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0, caller_gid=0{0,}) currently failed to authpin, subtree is being exported 2019-09-23 11:43:39.923 7f4f401e8700  0 log_channel(cluster) log [WRN] : slow request 482.712888 seconds old, received at 2019-09-23 11:35:37.217113: client_request(client.38347357:111963 lookup #0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0, caller_gid=0{0,}) currently failed to authpin, subtree is being exported 2019-09-23 11:51:40.236 7f4f401e8700  0 log_channel(cluster) log [WRN] : slow request 963.037049 seconds old, received at 2019-09-23 11:35:37.217113: client_request(client.38347357:111963 lookup #0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0, caller_gid=0{0,}) currently failed to authpin, subtree is being exported 2019-09-23 11:57:20.308 7f4f401e8700  0 log_channel(cluster) log [WRN] : slow request 1922.719287 seconds old, received at 2019-09-23 11:25:17.598152: client_request(client.38352684:92684 lookup #0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0, caller_gid=0{0,}) currently failed to authpin, subtree is being exported 2019-09-23 12:07:40.621 7f4f401e8700  0 log_channel(cluster) log [WRN] : slow request 1923.409501 seconds old, received at 2019-09-23 11:35:37.217113: client_request(client.38347357:111963 lookup #0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0, caller_gid=0{0,}) currently failed to authpin, subtree is being exported 2019-09-23 12:29:20.639 7f4f401e8700  0 log_channel(cluster) log [WRN] : slow request 3843.057602 seconds old, received at 2019-09-23 11:25:17.598152: client_request(client.38352684:92684 lookup #0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0, caller_gid=0{0,}) currently failed to authpin, subtree is being exported 2019-09-23 12:39:40.872 7f4f401e8700  0 log_channel(cluster) log [WRN] : slow request 3843.664914 seconds old, received at 2019-09-23 11:35:37.217113: client_request(client.38347357:111963 lookup #0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0, caller_gid=0{0,}) currently failed to authpin, subtree is being exported
If I try to ls this paths, the client will hang.

I tried this using ceph kernel client of centos7.6 and now also with the ceph-fuse of 14.2.3, I see the issue with both. I tried remounting, but that did not solve the issue, if I restart the mds, the issue goes away - for some time


[root@mds02 ~]# ceph -s
  cluster:
    id:     92bfcf0a-1d39-43b3-b60f-44f01b630e47
    health: HEALTH_WARN
            1 MDSs report slow requests
            2 MDSs behind on trimming

  services:
    mon: 3 daemons, quorum mds01,mds02,mds03 (age 4d)
    mgr: mds02(active, since 4d), standbys: mds01, mds03
    mds: ceph_fs:2 {0=mds03=up:active,1=mds02=up:active} 1 up:standby
    osd: 535 osds: 535 up, 535 in

  data:
    pools:   3 pools, 3328 pgs
    objects: 375.14M objects, 672 TiB
    usage:   1.0 PiB used, 2.2 PiB / 3.2 PiB avail
    pgs:     3319 active+clean
             9    active+clean+scrubbing+deep

  io:
    client:   141 KiB/s rd, 54 MiB/s wr, 62 op/s rd, 577 op/s wr


[root@mds02 ~]# ceph health detail
HEALTH_WARN 1 MDSs report slow requests; 2 MDSs behind on trimming
MDS_SLOW_REQUEST 1 MDSs report slow requests
    mdsmds02(mds.1): 2 slow requests are blocked > 30 secs
MDS_TRIM 2 MDSs behind on trimming
    mdsmds02(mds.1): Behind on trimming (3407/200) max_segments: 200, num_segments: 3407     mdsmds03(mds.0): Behind on trimming (4240/200) max_segments: 200, num_segments: 4240

Can someone help me to debug this further?

Thank you!

Kenneth

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux