Re: hanging slow requests: failed to authpin, subtree is being exported

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 23, 2019 at 4:14 AM Kenneth Waegeman
<kenneth.waegeman@xxxxxxxx> wrote:
>
> Hi all,
>
> When syncing data with rsync, I'm often getting blocked slow requests,
> which also block access to this path.
>
> > 2019-09-23 11:25:49.477 7f4f401e8700  0 log_channel(cluster) log [WRN]
> > : slow request 31.895478 seconds old, received at 2019-09-23
> > 11:25:17.598152: client_request(client.38352684:92684 lookup
> > #0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
> > caller_gid=0{0,}) currently failed to authpin, subtree is being exported
> > 2019-09-23 11:26:19.477 7f4f401e8700  0 log_channel(cluster) log [WRN]
> > : slow request 61.896079 seconds old, received at 2019-09-23
> > 11:25:17.598152: client_request(client.38352684:92684 lookup
> > #0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
> > caller_gid=0{0,}) currently failed to authpin, subtree is being exported
> > 2019-09-23 11:27:19.478 7f4f401e8700  0 log_channel(cluster) log [WRN]
> > : slow request 121.897268 seconds old, received at 2019-09-23
> > 11:25:17.598152: client_request(client.38352684:92684 lookup
> > #0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
> > caller_gid=0{0,}) currently failed to authpin, subtree is being exported
> > 2019-09-23 11:29:19.488 7f4f401e8700  0 log_channel(cluster) log [WRN]
> > : slow request 241.899467 seconds old, received at 2019-09-23
> > 11:25:17.598152: client_request(client.38352684:92684 lookup
> > #0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
> > caller_gid=0{0,}) currently failed to authpin, subtree is being exported
> > 2019-09-23 11:33:19.680 7f4f401e8700  0 log_channel(cluster) log [WRN]
> > : slow request 482.087927 seconds old, received at 2019-09-23
> > 11:25:17.598152: client_request(client.38352684:92684 lookup
> > #0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
> > caller_gid=0{0,}) currently failed to authpin, subtree is being exported
> > 2019-09-23 11:36:09.881 7f4f401e8700  0 log_channel(cluster) log [WRN]
> > : slow request 32.677511 seconds old, received at 2019-09-23
> > 11:35:37.217113: client_request(client.38347357:111963 lookup
> > #0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
> > caller_gid=0{0,}) currently failed to authpin, subtree is being exported
> > 2019-09-23 11:36:39.881 7f4f401e8700  0 log_channel(cluster) log [WRN]
> > : slow request 62.678132 seconds old, received at 2019-09-23
> > 11:35:37.217113: client_request(client.38347357:111963 lookup
> > #0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
> > caller_gid=0{0,}) currently failed to authpin, subtree is being exported
> > 2019-09-23 11:37:39.891 7f4f401e8700  0 log_channel(cluster) log [WRN]
> > : slow request 122.679273 seconds old, received at 2019-09-23
> > 11:35:37.217113: client_request(client.38347357:111963 lookup
> > #0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
> > caller_gid=0{0,}) currently failed to authpin, subtree is being exported
> > 2019-09-23 11:39:39.892 7f4f401e8700  0 log_channel(cluster) log [WRN]
> > : slow request 242.684667 seconds old, received at 2019-09-23
> > 11:35:37.217113: client_request(client.38347357:111963 lookup
> > #0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
> > caller_gid=0{0,}) currently failed to authpin, subtree is being exported
> > 2019-09-23 11:41:19.893 7f4f401e8700  0 log_channel(cluster) log [WRN]
> > : slow request 962.305681 seconds old, received at 2019-09-23
> > 11:25:17.598152: client_request(client.38352684:92684 lookup
> > #0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
> > caller_gid=0{0,}) currently failed to authpin, subtree is being exported
> > 2019-09-23 11:43:39.923 7f4f401e8700  0 log_channel(cluster) log [WRN]
> > : slow request 482.712888 seconds old, received at 2019-09-23
> > 11:35:37.217113: client_request(client.38347357:111963 lookup
> > #0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
> > caller_gid=0{0,}) currently failed to authpin, subtree is being exported
> > 2019-09-23 11:51:40.236 7f4f401e8700  0 log_channel(cluster) log [WRN]
> > : slow request 963.037049 seconds old, received at 2019-09-23
> > 11:35:37.217113: client_request(client.38347357:111963 lookup
> > #0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
> > caller_gid=0{0,}) currently failed to authpin, subtree is being exported
> > 2019-09-23 11:57:20.308 7f4f401e8700  0 log_channel(cluster) log [WRN]
> > : slow request 1922.719287 seconds old, received at 2019-09-23
> > 11:25:17.598152: client_request(client.38352684:92684 lookup
> > #0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
> > caller_gid=0{0,}) currently failed to authpin, subtree is being exported
> > 2019-09-23 12:07:40.621 7f4f401e8700  0 log_channel(cluster) log [WRN]
> > : slow request 1923.409501 seconds old, received at 2019-09-23
> > 11:35:37.217113: client_request(client.38347357:111963 lookup
> > #0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
> > caller_gid=0{0,}) currently failed to authpin, subtree is being exported
> > 2019-09-23 12:29:20.639 7f4f401e8700  0 log_channel(cluster) log [WRN]
> > : slow request 3843.057602 seconds old, received at 2019-09-23
> > 11:25:17.598152: client_request(client.38352684:92684 lookup
> > #0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
> > caller_gid=0{0,}) currently failed to authpin, subtree is being exported
> > 2019-09-23 12:39:40.872 7f4f401e8700  0 log_channel(cluster) log [WRN]
> > : slow request 3843.664914 seconds old, received at 2019-09-23
> > 11:35:37.217113: client_request(client.38347357:111963 lookup
> > #0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
> > caller_gid=0{0,}) currently failed to authpin, subtree is being exported
> If I try to ls this paths, the client will hang.
>
> I tried this using ceph kernel client of centos7.6 and now also with the
> ceph-fuse of 14.2.3, I see the issue with both. I tried remounting, but
> that did not solve the issue, if I restart the mds, the issue goes away
> - for some time
>
>
> > [root@mds02 ~]# ceph -s
> >   cluster:
> >     id:     92bfcf0a-1d39-43b3-b60f-44f01b630e47
> >     health: HEALTH_WARN
> >             1 MDSs report slow requests
> >             2 MDSs behind on trimming
> >
> >   services:
> >     mon: 3 daemons, quorum mds01,mds02,mds03 (age 4d)
> >     mgr: mds02(active, since 4d), standbys: mds01, mds03
> >     mds: ceph_fs:2 {0=mds03=up:active,1=mds02=up:active} 1 up:standby
> >     osd: 535 osds: 535 up, 535 in
> >
> >   data:
> >     pools:   3 pools, 3328 pgs
> >     objects: 375.14M objects, 672 TiB
> >     usage:   1.0 PiB used, 2.2 PiB / 3.2 PiB avail
> >     pgs:     3319 active+clean
> >              9    active+clean+scrubbing+deep
> >
> >   io:
> >     client:   141 KiB/s rd, 54 MiB/s wr, 62 op/s rd, 577 op/s wr
> >
>
> > [root@mds02 ~]# ceph health detail
> > HEALTH_WARN 1 MDSs report slow requests; 2 MDSs behind on trimming
> > MDS_SLOW_REQUEST 1 MDSs report slow requests
> >     mdsmds02(mds.1): 2 slow requests are blocked > 30 secs
> > MDS_TRIM 2 MDSs behind on trimming
> >     mdsmds02(mds.1): Behind on trimming (3407/200) max_segments: 200,
> > num_segments: 3407
> >     mdsmds03(mds.0): Behind on trimming (4240/200) max_segments: 200,
> > num_segments: 4240
>
> Can someone help me to debug this further?

What is the make up of your cluster? It sounds like it may be all HDD.

If so, try adding this to /etc/ceph/ceph.conf on your OSDs and restart
the processes.

osd op queue cut off = high

Depending on your version (default in newer versions), adding

osd op queue = wpq

can also help.

----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux