MDS getattr op stuck in snapshot

Hector Martin <hector@xxxxxxxxxxxxxx> · Wed, 12 Jun 2019 16:26:10 +0900

Hi list,

I have a setup where two clients mount the same filesystem and
read/write from mostly non-overlapping subsets of files (Dovecot mail
storage/indices). There is a third client that takes backups by
snapshotting the top-level directory, then rsyncing the snapshot over to
another location.

Ever since I switched the backup process to using snapshots, the rsync
process has stalled at a certain point during the backup with a stuck
MDS op:

root@mon02:~# ceph daemon mds.mon02 dump_ops_in_flight
{
    "ops": [
        {
            "description": "client_request(client.146682828:199050
getattr pAsLsXsFs #0x10000000007//bak-20190612094501/<snip
path>/dovecot.index.log 2019-06-12 12:20:56.992049 caller_uid=5000,
caller_gid=5000{})",
            "initiated_at": "2019-06-12 12:20:57.001534",
            "age": 9563.847754,
            "duration": 9563.847780,
            "type_data": {
                "flag_point": "failed to rdlock, waiting",
                "reqid": "client.146682828:199050",
                "op_type": "client_request",
                "client_info": {
                    "client": "client.146682828",
                    "tid": 199050
                },
                "events": [
                    {
                        "time": "2019-06-12 12:20:57.001534",
                        "event": "initiated"
                    },
                    {
                        "time": "2019-06-12 12:20:57.001534",
                        "event": "header_read"
                    },
                    {
                        "time": "2019-06-12 12:20:57.001538",
                        "event": "throttled"
                    },
                    {
                        "time": "2019-06-12 12:20:57.001550",
                        "event": "all_read"
                    },
                    {
                        "time": "2019-06-12 12:20:57.001713",
                        "event": "dispatched"
                    },
                    {
                        "time": "2019-06-12 12:20:57.001997",
                        "event": "failed to rdlock, waiting"
                    }
                ]
            }
        }
    ],
    "num_ops": 1
}

AIUI, when a snapshot is taken, all clients with dirty data are supposed
to get a message to flush it to the cluster in order to produce a
consistent snapshot. My guess is this isn't happening properly, so reads
of that file in the snapshot are blocked. Doing a 'echo 3 >
/proc/sys/vm/drop_caches' on both of the writing clients seems to clear
the stuck op, but doing it once isn't enough; usually I get the stuck up
and have to clear caches twice after making any given snapshot.

Everything is on Ubuntu. The cluster is running 13.2.4 (mimic), and the
clients are using the kernel client version 4.18.0-20-generic (writers)
and 4.18.0-21-generic (backup host).

I managed to reproduce it like this:

host1$ mkdir _test
host1$ cd _test/.snap

host2$ cd _test
host2$ for i in $(seq 1 10000); do (sleep 0.1; echo $i; sleep 1) > b_$i
& sleep 0.05; done

(while that is running)

host1$ mkdir s11
host1$ cd s11

(wait a few seconds)

host2$ ^C

host1$ ls -al
(hangs)

This yielded this stuck request:

{
    "ops": [
        {
            "description": "client_request(client.146687505:13785
getattr pAsLsXsFs #0x1000017f41c//s11/b_42 2019-06-12 16:15:59.095025
caller_uid=0, caller_gid=0{})",
            "initiated_at": "2019-06-12 16:15:59.095559",
            "age": 30.846294,
            "duration": 30.846318,
            "type_data": {
                "flag_point": "failed to rdlock, waiting",
                "reqid": "client.146687505:13785",
                "op_type": "client_request",
                "client_info": {
                    "client": "client.146687505",
                    "tid": 13785
                },
                "events": [
                    {
                        "time": "2019-06-12 16:15:59.095559",
                        "event": "initiated"
                    },
                    {
                        "time": "2019-06-12 16:15:59.095559",
                        "event": "header_read"
                    },
                    {
                        "time": "2019-06-12 16:15:59.095562",
                        "event": "throttled"
                    },
                    {
                        "time": "2019-06-12 16:15:59.095573",
                        "event": "all_read"
                    },
                    {
                        "time": "2019-06-12 16:15:59.096201",
                        "event": "dispatched"
                    },
                    {
                        "time": "2019-06-12 16:15:59.096318",
                        "event": "failed to rdlock, waiting"
                    },
                    {
                        "time": "2019-06-12 16:15:59.268368",
                        "event": "failed to rdlock, waiting"
                    }
                ]
            }
        }
    ],
    "num_ops": 1
}

My guess is somewhere along the line of this process there's a race
condition and the dirty client isn't properly flushing its data.

A 'sync' on host2 does not clear the stuck op. 'echo 1 >
/proc/sys/vm/drop_caches' does not either, while 'echo 2 >
/proc/sys/vm/drop_caches' does fix it. So I guess the problem is a
dentry/inode that is stuck dirty in the cache of host2?

-- 
Hector Martin (hector@xxxxxxxxxxxxxx)
Public Key: https://mrcn.st/pub
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com