Hi all,
When syncing data with rsync, I'm often getting blocked slow requests,
which also block access to this path.
2019-09-23 11:25:49.477 7f4f401e8700 0 log_channel(cluster) log [WRN]
: slow request 31.895478 seconds old, received at 2019-09-23
11:25:17.598152: client_request(client.38352684:92684 lookup
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:26:19.477 7f4f401e8700 0 log_channel(cluster) log [WRN]
: slow request 61.896079 seconds old, received at 2019-09-23
11:25:17.598152: client_request(client.38352684:92684 lookup
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:27:19.478 7f4f401e8700 0 log_channel(cluster) log [WRN]
: slow request 121.897268 seconds old, received at 2019-09-23
11:25:17.598152: client_request(client.38352684:92684 lookup
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:29:19.488 7f4f401e8700 0 log_channel(cluster) log [WRN]
: slow request 241.899467 seconds old, received at 2019-09-23
11:25:17.598152: client_request(client.38352684:92684 lookup
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:33:19.680 7f4f401e8700 0 log_channel(cluster) log [WRN]
: slow request 482.087927 seconds old, received at 2019-09-23
11:25:17.598152: client_request(client.38352684:92684 lookup
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:36:09.881 7f4f401e8700 0 log_channel(cluster) log [WRN]
: slow request 32.677511 seconds old, received at 2019-09-23
11:35:37.217113: client_request(client.38347357:111963 lookup
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:36:39.881 7f4f401e8700 0 log_channel(cluster) log [WRN]
: slow request 62.678132 seconds old, received at 2019-09-23
11:35:37.217113: client_request(client.38347357:111963 lookup
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:37:39.891 7f4f401e8700 0 log_channel(cluster) log [WRN]
: slow request 122.679273 seconds old, received at 2019-09-23
11:35:37.217113: client_request(client.38347357:111963 lookup
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:39:39.892 7f4f401e8700 0 log_channel(cluster) log [WRN]
: slow request 242.684667 seconds old, received at 2019-09-23
11:35:37.217113: client_request(client.38347357:111963 lookup
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:41:19.893 7f4f401e8700 0 log_channel(cluster) log [WRN]
: slow request 962.305681 seconds old, received at 2019-09-23
11:25:17.598152: client_request(client.38352684:92684 lookup
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:43:39.923 7f4f401e8700 0 log_channel(cluster) log [WRN]
: slow request 482.712888 seconds old, received at 2019-09-23
11:35:37.217113: client_request(client.38347357:111963 lookup
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:51:40.236 7f4f401e8700 0 log_channel(cluster) log [WRN]
: slow request 963.037049 seconds old, received at 2019-09-23
11:35:37.217113: client_request(client.38347357:111963 lookup
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:57:20.308 7f4f401e8700 0 log_channel(cluster) log [WRN]
: slow request 1922.719287 seconds old, received at 2019-09-23
11:25:17.598152: client_request(client.38352684:92684 lookup
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 12:07:40.621 7f4f401e8700 0 log_channel(cluster) log [WRN]
: slow request 1923.409501 seconds old, received at 2019-09-23
11:35:37.217113: client_request(client.38347357:111963 lookup
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 12:29:20.639 7f4f401e8700 0 log_channel(cluster) log [WRN]
: slow request 3843.057602 seconds old, received at 2019-09-23
11:25:17.598152: client_request(client.38352684:92684 lookup
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 12:39:40.872 7f4f401e8700 0 log_channel(cluster) log [WRN]
: slow request 3843.664914 seconds old, received at 2019-09-23
11:35:37.217113: client_request(client.38347357:111963 lookup
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
If I try to ls this paths, the client will hang.
I tried this using ceph kernel client of centos7.6 and now also with the
ceph-fuse of 14.2.3, I see the issue with both. I tried remounting, but
that did not solve the issue, if I restart the mds, the issue goes away
- for some time
[root@mds02 ~]# ceph -s
cluster:
id: 92bfcf0a-1d39-43b3-b60f-44f01b630e47
health: HEALTH_WARN
1 MDSs report slow requests
2 MDSs behind on trimming
services:
mon: 3 daemons, quorum mds01,mds02,mds03 (age 4d)
mgr: mds02(active, since 4d), standbys: mds01, mds03
mds: ceph_fs:2 {0=mds03=up:active,1=mds02=up:active} 1 up:standby
osd: 535 osds: 535 up, 535 in
data:
pools: 3 pools, 3328 pgs
objects: 375.14M objects, 672 TiB
usage: 1.0 PiB used, 2.2 PiB / 3.2 PiB avail
pgs: 3319 active+clean
9 active+clean+scrubbing+deep
io:
client: 141 KiB/s rd, 54 MiB/s wr, 62 op/s rd, 577 op/s wr
[root@mds02 ~]# ceph health detail
HEALTH_WARN 1 MDSs report slow requests; 2 MDSs behind on trimming
MDS_SLOW_REQUEST 1 MDSs report slow requests
mdsmds02(mds.1): 2 slow requests are blocked > 30 secs
MDS_TRIM 2 MDSs behind on trimming
mdsmds02(mds.1): Behind on trimming (3407/200) max_segments: 200,
num_segments: 3407
mdsmds03(mds.0): Behind on trimming (4240/200) max_segments: 200,
num_segments: 4240
Can someone help me to debug this further?
Thank you!
Kenneth
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com