Re: MDS getattr op stuck in snapshot

"Yan, Zheng" <ukernel@xxxxxxxxx> · Wed, 12 Jun 2019 21:33:09 +0800

On Wed, Jun 12, 2019 at 3:26 PM Hector Martin <hector@xxxxxxxxxxxxxx> wrote:
>
> Hi list,
>
> I have a setup where two clients mount the same filesystem and
> read/write from mostly non-overlapping subsets of files (Dovecot mail
> storage/indices). There is a third client that takes backups by
> snapshotting the top-level directory, then rsyncing the snapshot over to
> another location.
>
> Ever since I switched the backup process to using snapshots, the rsync
> process has stalled at a certain point during the backup with a stuck
> MDS op:
>
> root@mon02:~# ceph daemon mds.mon02 dump_ops_in_flight
> {
>     "ops": [
>         {
>             "description": "client_request(client.146682828:199050
> getattr pAsLsXsFs #0x10000000007//bak-20190612094501/<snip
> path>/dovecot.index.log 2019-06-12 12:20:56.992049 caller_uid=5000,
> caller_gid=5000{})",
>             "initiated_at": "2019-06-12 12:20:57.001534",
>             "age": 9563.847754,
>             "duration": 9563.847780,
>             "type_data": {
>                 "flag_point": "failed to rdlock, waiting",
>                 "reqid": "client.146682828:199050",
>                 "op_type": "client_request",
>                 "client_info": {
>                     "client": "client.146682828",
>                     "tid": 199050
>                 },
>                 "events": [
>                     {
>                         "time": "2019-06-12 12:20:57.001534",
>                         "event": "initiated"
>                     },
>                     {
>                         "time": "2019-06-12 12:20:57.001534",
>                         "event": "header_read"
>                     },
>                     {
>                         "time": "2019-06-12 12:20:57.001538",
>                         "event": "throttled"
>                     },
>                     {
>                         "time": "2019-06-12 12:20:57.001550",
>                         "event": "all_read"
>                     },
>                     {
>                         "time": "2019-06-12 12:20:57.001713",
>                         "event": "dispatched"
>                     },
>                     {
>                         "time": "2019-06-12 12:20:57.001997",
>                         "event": "failed to rdlock, waiting"
>                     }
>                 ]
>             }
>         }
>     ],
>     "num_ops": 1
> }
>
> AIUI, when a snapshot is taken, all clients with dirty data are supposed
> to get a message to flush it to the cluster in order to produce a
> consistent snapshot. My guess is this isn't happening properly, so reads
> of that file in the snapshot are blocked. Doing a 'echo 3 >
> /proc/sys/vm/drop_caches' on both of the writing clients seems to clear
> the stuck op, but doing it once isn't enough; usually I get the stuck up
> and have to clear caches twice after making any given snapshot.
>
> Everything is on Ubuntu. The cluster is running 13.2.4 (mimic), and the
> clients are using the kernel client version 4.18.0-20-generic (writers)
> and 4.18.0-21-generic (backup host).
>
> I managed to reproduce it like this:
>
> host1$ mkdir _test
> host1$ cd _test/.snap
>
> host2$ cd _test
> host2$ for i in $(seq 1 10000); do (sleep 0.1; echo $i; sleep 1) > b_$i
> & sleep 0.05; done
>
> (while that is running)
>
> host1$ mkdir s11
> host1$ cd s11
>
> (wait a few seconds)
>
> host2$ ^C
>
> host1$ ls -al
> (hangs)
>
> This yielded this stuck request:
>
> {
>     "ops": [
>         {
>             "description": "client_request(client.146687505:13785
> getattr pAsLsXsFs #0x1000017f41c//s11/b_42 2019-06-12 16:15:59.095025
> caller_uid=0, caller_gid=0{})",
>             "initiated_at": "2019-06-12 16:15:59.095559",
>             "age": 30.846294,
>             "duration": 30.846318,
>             "type_data": {
>                 "flag_point": "failed to rdlock, waiting",
>                 "reqid": "client.146687505:13785",
>                 "op_type": "client_request",
>                 "client_info": {
>                     "client": "client.146687505",
>                     "tid": 13785
>                 },
>                 "events": [
>                     {
>                         "time": "2019-06-12 16:15:59.095559",
>                         "event": "initiated"
>                     },
>                     {
>                         "time": "2019-06-12 16:15:59.095559",
>                         "event": "header_read"
>                     },
>                     {
>                         "time": "2019-06-12 16:15:59.095562",
>                         "event": "throttled"
>                     },
>                     {
>                         "time": "2019-06-12 16:15:59.095573",
>                         "event": "all_read"
>                     },
>                     {
>                         "time": "2019-06-12 16:15:59.096201",
>                         "event": "dispatched"
>                     },
>                     {
>                         "time": "2019-06-12 16:15:59.096318",
>                         "event": "failed to rdlock, waiting"
>                     },
>                     {
>                         "time": "2019-06-12 16:15:59.268368",
>                         "event": "failed to rdlock, waiting"
>                     }
>                 ]
>             }
>         }
>     ],
>     "num_ops": 1
> }
>
> My guess is somewhere along the line of this process there's a race
> condition and the dirty client isn't properly flushing its data.
>
> A 'sync' on host2 does not clear the stuck op. 'echo 1 >
> /proc/sys/vm/drop_caches' does not either, while 'echo 2 >
> /proc/sys/vm/drop_caches' does fix it. So I guess the problem is a
> dentry/inode that is stuck dirty in the cache of host2?
>

I have tracked down the bug. thank you for reporting this.  'echo 2 >
/proc/sys/vm/drop_cache' should fix the hang.  If you can compile ceph
from source, please try following patch.

diff --git a/src/mds/Locker.cc b/src/mds/Locker.cc
index ecd06294fa..94b947975a 100644
--- a/src/mds/Locker.cc
+++ b/src/mds/Locker.cc
@@ -2956,7 +2956,8 @@ void Locker::handle_client_caps(MClientCaps *m)

       // client flushes and releases caps at the same time. make sure
MDCache::cow_inode()
       // properly setup CInode::client_need_snapflush
-      if ((m->get_dirty() & ~cap->issued()) && !need_snapflush)
+      if (!need_snapflush && (m->get_dirty() & ~cap->issued()) &&
+         (m->flags & MClientCaps::FLAG_PENDING_CAPSNAP))
        cap->mark_needsnapflush();
     }





> --
> Hector Martin (hector@xxxxxxxxxxxxxx)
> Public Key: https://mrcn.st/pub
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com