On Mon, Sep 23, 2019 at 4:14 AM Kenneth Waegeman <kenneth.waegeman@xxxxxxxx> wrote: > > Hi all, > > When syncing data with rsync, I'm often getting blocked slow requests, > which also block access to this path. > > > 2019-09-23 11:25:49.477 7f4f401e8700 0 log_channel(cluster) log [WRN] > > : slow request 31.895478 seconds old, received at 2019-09-23 > > 11:25:17.598152: client_request(client.38352684:92684 lookup > > #0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0, > > caller_gid=0{0,}) currently failed to authpin, subtree is being exported > > 2019-09-23 11:26:19.477 7f4f401e8700 0 log_channel(cluster) log [WRN] > > : slow request 61.896079 seconds old, received at 2019-09-23 > > 11:25:17.598152: client_request(client.38352684:92684 lookup > > #0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0, > > caller_gid=0{0,}) currently failed to authpin, subtree is being exported > > 2019-09-23 11:27:19.478 7f4f401e8700 0 log_channel(cluster) log [WRN] > > : slow request 121.897268 seconds old, received at 2019-09-23 > > 11:25:17.598152: client_request(client.38352684:92684 lookup > > #0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0, > > caller_gid=0{0,}) currently failed to authpin, subtree is being exported > > 2019-09-23 11:29:19.488 7f4f401e8700 0 log_channel(cluster) log [WRN] > > : slow request 241.899467 seconds old, received at 2019-09-23 > > 11:25:17.598152: client_request(client.38352684:92684 lookup > > #0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0, > > caller_gid=0{0,}) currently failed to authpin, subtree is being exported > > 2019-09-23 11:33:19.680 7f4f401e8700 0 log_channel(cluster) log [WRN] > > : slow request 482.087927 seconds old, received at 2019-09-23 > > 11:25:17.598152: client_request(client.38352684:92684 lookup > > #0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0, > > caller_gid=0{0,}) currently failed to authpin, subtree is being exported > > 2019-09-23 11:36:09.881 7f4f401e8700 0 log_channel(cluster) log [WRN] > > : slow request 32.677511 seconds old, received at 2019-09-23 > > 11:35:37.217113: client_request(client.38347357:111963 lookup > > #0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0, > > caller_gid=0{0,}) currently failed to authpin, subtree is being exported > > 2019-09-23 11:36:39.881 7f4f401e8700 0 log_channel(cluster) log [WRN] > > : slow request 62.678132 seconds old, received at 2019-09-23 > > 11:35:37.217113: client_request(client.38347357:111963 lookup > > #0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0, > > caller_gid=0{0,}) currently failed to authpin, subtree is being exported > > 2019-09-23 11:37:39.891 7f4f401e8700 0 log_channel(cluster) log [WRN] > > : slow request 122.679273 seconds old, received at 2019-09-23 > > 11:35:37.217113: client_request(client.38347357:111963 lookup > > #0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0, > > caller_gid=0{0,}) currently failed to authpin, subtree is being exported > > 2019-09-23 11:39:39.892 7f4f401e8700 0 log_channel(cluster) log [WRN] > > : slow request 242.684667 seconds old, received at 2019-09-23 > > 11:35:37.217113: client_request(client.38347357:111963 lookup > > #0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0, > > caller_gid=0{0,}) currently failed to authpin, subtree is being exported > > 2019-09-23 11:41:19.893 7f4f401e8700 0 log_channel(cluster) log [WRN] > > : slow request 962.305681 seconds old, received at 2019-09-23 > > 11:25:17.598152: client_request(client.38352684:92684 lookup > > #0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0, > > caller_gid=0{0,}) currently failed to authpin, subtree is being exported > > 2019-09-23 11:43:39.923 7f4f401e8700 0 log_channel(cluster) log [WRN] > > : slow request 482.712888 seconds old, received at 2019-09-23 > > 11:35:37.217113: client_request(client.38347357:111963 lookup > > #0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0, > > caller_gid=0{0,}) currently failed to authpin, subtree is being exported > > 2019-09-23 11:51:40.236 7f4f401e8700 0 log_channel(cluster) log [WRN] > > : slow request 963.037049 seconds old, received at 2019-09-23 > > 11:35:37.217113: client_request(client.38347357:111963 lookup > > #0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0, > > caller_gid=0{0,}) currently failed to authpin, subtree is being exported > > 2019-09-23 11:57:20.308 7f4f401e8700 0 log_channel(cluster) log [WRN] > > : slow request 1922.719287 seconds old, received at 2019-09-23 > > 11:25:17.598152: client_request(client.38352684:92684 lookup > > #0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0, > > caller_gid=0{0,}) currently failed to authpin, subtree is being exported > > 2019-09-23 12:07:40.621 7f4f401e8700 0 log_channel(cluster) log [WRN] > > : slow request 1923.409501 seconds old, received at 2019-09-23 > > 11:35:37.217113: client_request(client.38347357:111963 lookup > > #0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0, > > caller_gid=0{0,}) currently failed to authpin, subtree is being exported > > 2019-09-23 12:29:20.639 7f4f401e8700 0 log_channel(cluster) log [WRN] > > : slow request 3843.057602 seconds old, received at 2019-09-23 > > 11:25:17.598152: client_request(client.38352684:92684 lookup > > #0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0, > > caller_gid=0{0,}) currently failed to authpin, subtree is being exported > > 2019-09-23 12:39:40.872 7f4f401e8700 0 log_channel(cluster) log [WRN] > > : slow request 3843.664914 seconds old, received at 2019-09-23 > > 11:35:37.217113: client_request(client.38347357:111963 lookup > > #0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0, > > caller_gid=0{0,}) currently failed to authpin, subtree is being exported > If I try to ls this paths, the client will hang. > > I tried this using ceph kernel client of centos7.6 and now also with the > ceph-fuse of 14.2.3, I see the issue with both. I tried remounting, but > that did not solve the issue, if I restart the mds, the issue goes away > - for some time > > > > [root@mds02 ~]# ceph -s > > cluster: > > id: 92bfcf0a-1d39-43b3-b60f-44f01b630e47 > > health: HEALTH_WARN > > 1 MDSs report slow requests > > 2 MDSs behind on trimming > > > > services: > > mon: 3 daemons, quorum mds01,mds02,mds03 (age 4d) > > mgr: mds02(active, since 4d), standbys: mds01, mds03 > > mds: ceph_fs:2 {0=mds03=up:active,1=mds02=up:active} 1 up:standby > > osd: 535 osds: 535 up, 535 in > > > > data: > > pools: 3 pools, 3328 pgs > > objects: 375.14M objects, 672 TiB > > usage: 1.0 PiB used, 2.2 PiB / 3.2 PiB avail > > pgs: 3319 active+clean > > 9 active+clean+scrubbing+deep > > > > io: > > client: 141 KiB/s rd, 54 MiB/s wr, 62 op/s rd, 577 op/s wr > > > > > [root@mds02 ~]# ceph health detail > > HEALTH_WARN 1 MDSs report slow requests; 2 MDSs behind on trimming > > MDS_SLOW_REQUEST 1 MDSs report slow requests > > mdsmds02(mds.1): 2 slow requests are blocked > 30 secs > > MDS_TRIM 2 MDSs behind on trimming > > mdsmds02(mds.1): Behind on trimming (3407/200) max_segments: 200, > > num_segments: 3407 > > mdsmds03(mds.0): Behind on trimming (4240/200) max_segments: 200, > > num_segments: 4240 > > Can someone help me to debug this further? What is the make up of your cluster? It sounds like it may be all HDD. If so, try adding this to /etc/ceph/ceph.conf on your OSDs and restart the processes. osd op queue cut off = high Depending on your version (default in newer versions), adding osd op queue = wpq can also help. ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com