Re: getattr - failed to rdlock waiting

Gregory Farnum <gfarnum@xxxxxxxxxx> · Wed, 3 Oct 2018 08:42:34 -0700

On Tue, Oct 2, 2018 at 12:18 PM Thomas Sumpter <thomas.sumpter@xxxxxxxxxx> wrote:

Hi Folks, 

I am looking for advice on how to troubleshoot some long operations found in MDS. Most of the time performance is fantastic, but occasionally and to no real pattern or trend, a gettattr op will take up to ~30
 seconds to complete in MDS which is stuck on "event": "failed to rdlock, waiting"

E.g.
"description": "client_request(client.84183:54794012 getattr pAsLsXsFs #0x10000038585 2018-10-02 07:56:27.554282 caller_uid=48, caller_gid=48{})",
"duration": 28.987992,
{
"time": "2018-09-25 07:56:27.552511",
"event": "failed to rdlock, waiting"
},
{
"time": "2018-09-25 07:56:56.529748",
"event": "failed to rdlock, waiting"
},
{
"time": "2018-09-25 07:56:56.540386",
"event": "acquired locks"
}

I can find no corresponding long op on any of the OSDs and no other op in MDS which this one could be waiting for.
Nearly all configuration will be the default. Currently have a small amount of data which is constantly being updated. 1 data pool and 1 metadata pool.
How can I track down what is holding up this op and try to stop it happening?

This is a weakness in the MDS introspection right now, unfortunately.

What the error message literally means is what it says — the op needs to get a read lock, but it can't, so it's waiting. This might mean that there's an MDS op in progress, but it usually means there's a client which is holding "write capabilities" on the inode in question, and it's asking for/waiting for that client to drop those capabilities.

This might take a while because of a buggy client, or because the client had a very large amount of buffered writes it is now frantically trying to flush out to RADOS as fast as it can.
-Greg

# rados df
…
total_objects    191
total_used       5.7 GiB
total_avail      367 GiB
total_space      373 GiB

Cephfs version 13.2.1 on CentOs 7.5
Kernel: 3.10.0-862.11.6.el7.x86_64
1x Active MDS, 1x Replay Standby MDS
3x MON
4x OSD
Bluestore FS

Ceph kernel client on CentOs 7.4
Kernel: 4.18.7-1.el7.elrepo.x86_64  (almost the latest, should be good?)

Many Thanks!
Tom

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com