Hi list, I have a setup where two clients mount the same filesystem and read/write from mostly non-overlapping subsets of files (Dovecot mail storage/indices). There is a third client that takes backups by snapshotting the top-level directory, then rsyncing the snapshot over to another location. Ever since I switched the backup process to using snapshots, the rsync process has stalled at a certain point during the backup with a stuck MDS op: root@mon02:~# ceph daemon mds.mon02 dump_ops_in_flight { "ops": [ { "description": "client_request(client.146682828:199050 getattr pAsLsXsFs #0x10000000007//bak-20190612094501/<snip path>/dovecot.index.log 2019-06-12 12:20:56.992049 caller_uid=5000, caller_gid=5000{})", "initiated_at": "2019-06-12 12:20:57.001534", "age": 9563.847754, "duration": 9563.847780, "type_data": { "flag_point": "failed to rdlock, waiting", "reqid": "client.146682828:199050", "op_type": "client_request", "client_info": { "client": "client.146682828", "tid": 199050 }, "events": [ { "time": "2019-06-12 12:20:57.001534", "event": "initiated" }, { "time": "2019-06-12 12:20:57.001534", "event": "header_read" }, { "time": "2019-06-12 12:20:57.001538", "event": "throttled" }, { "time": "2019-06-12 12:20:57.001550", "event": "all_read" }, { "time": "2019-06-12 12:20:57.001713", "event": "dispatched" }, { "time": "2019-06-12 12:20:57.001997", "event": "failed to rdlock, waiting" } ] } } ], "num_ops": 1 } AIUI, when a snapshot is taken, all clients with dirty data are supposed to get a message to flush it to the cluster in order to produce a consistent snapshot. My guess is this isn't happening properly, so reads of that file in the snapshot are blocked. Doing a 'echo 3 > /proc/sys/vm/drop_caches' on both of the writing clients seems to clear the stuck op, but doing it once isn't enough; usually I get the stuck up and have to clear caches twice after making any given snapshot. Everything is on Ubuntu. The cluster is running 13.2.4 (mimic), and the clients are using the kernel client version 4.18.0-20-generic (writers) and 4.18.0-21-generic (backup host). I managed to reproduce it like this: host1$ mkdir _test host1$ cd _test/.snap host2$ cd _test host2$ for i in $(seq 1 10000); do (sleep 0.1; echo $i; sleep 1) > b_$i & sleep 0.05; done (while that is running) host1$ mkdir s11 host1$ cd s11 (wait a few seconds) host2$ ^C host1$ ls -al (hangs) This yielded this stuck request: { "ops": [ { "description": "client_request(client.146687505:13785 getattr pAsLsXsFs #0x1000017f41c//s11/b_42 2019-06-12 16:15:59.095025 caller_uid=0, caller_gid=0{})", "initiated_at": "2019-06-12 16:15:59.095559", "age": 30.846294, "duration": 30.846318, "type_data": { "flag_point": "failed to rdlock, waiting", "reqid": "client.146687505:13785", "op_type": "client_request", "client_info": { "client": "client.146687505", "tid": 13785 }, "events": [ { "time": "2019-06-12 16:15:59.095559", "event": "initiated" }, { "time": "2019-06-12 16:15:59.095559", "event": "header_read" }, { "time": "2019-06-12 16:15:59.095562", "event": "throttled" }, { "time": "2019-06-12 16:15:59.095573", "event": "all_read" }, { "time": "2019-06-12 16:15:59.096201", "event": "dispatched" }, { "time": "2019-06-12 16:15:59.096318", "event": "failed to rdlock, waiting" }, { "time": "2019-06-12 16:15:59.268368", "event": "failed to rdlock, waiting" } ] } } ], "num_ops": 1 } My guess is somewhere along the line of this process there's a race condition and the dirty client isn't properly flushing its data. A 'sync' on host2 does not clear the stuck op. 'echo 1 > /proc/sys/vm/drop_caches' does not either, while 'echo 2 > /proc/sys/vm/drop_caches' does fix it. So I guess the problem is a dentry/inode that is stuck dirty in the cache of host2? -- Hector Martin (hector@xxxxxxxxxxxxxx) Public Key: https://mrcn.st/pub _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com