Re: Ceph MDS WRN replayed op client.$id

John Spray <jspray@xxxxxxxxxx> · Thu, 13 Sep 2018 11:13:19 +0100

On Thu, Sep 13, 2018 at 11:01 AM Stefan Kooman <stefan@xxxxxx> wrote:
>
> Hi John,
>
> Quoting John Spray (jspray@xxxxxxxxxx):
>
> > On Wed, Sep 12, 2018 at 2:59 PM Stefan Kooman <stefan@xxxxxx> wrote:
> >
> > When replaying a journal (either on MDS startup or on a standby-replay
> > MDS), the replayed file creation operations are being checked for
> > consistency with the state of the replayed client sessions.  Client
> > sessions have a "preallocated _inos" list that contains a set of inode
> > numbers they should be using to create new files.
> >
> > There are two checks being done: a soft check (just log it) that the
> > inode used for a new file is the same one that the session would be
> > expected to use for a new file, and a hard check (assertion) that the
> > inode used is one of the inode numbers that can be used for a new
> > file.  When that soft check fails, it doesn't indicate anything
> > inconsistent in the metadata, just that the inodes are being used in
> > an unexpected order.
> >
> > The WRN severity message mainly benefits our automated testing -- the
> > hope would be that if we're hitting strange scenarios like this in
> > automated tests then it would trigger a test failure (we by fail tests
> > if they emit unexpected warnings).
>
> Thanks for the explanation.
>
> > It would be interesting to know more about what's going on on your
> > cluster when this is happening -- do you have standby replay MDSs?
> > Multiple active MDSs?  Were any daemons failing over at a similar time
> > to the warnings?  Did you have anything funny going on with clients
> > (like forcing them to reconnect after being evicted)?
>
> Two MDSs in total. One active, one standby-replay. The clients are doing
> "funny" stuff. We are testing "CTDB" [1] in combination with cephfs to
> build a HA setup (to prevent split brain). We have two clients that, in
> case of a failure, need to require a lock on a file "ctdb_recovery_lock"
> before doing a recovery. Somehow, while configuring this setup, we
> triggered the "replayed op" warnings. We try to reproduce that, but no
> matter what we do the "replayed op" warnings do not occur anymore ...
>
> We have seen these warnings before (other clients). Warnings started
> after we had switched from mds1 -> mds2 (upgrade of Ceph cluster
> according to MDS upgrade procedure, reboots afterwards, hence the
> failover).
>
> Something I just realised is that _only_ the active-standby MDS
> is emitting the warnings, not the active MDS.
>
> Not related to the "replayed op" warning, but related to the CTDB "lock
> issue":
>
> The "surviving" cephfs client tries to acquire a lock on a file, but
> although the other client is dead (but not yet evicted by the MDS) it
> can't. Not until the dead client is evicted by the MDS after ~ 300 sec
> (mds_session_autoclose=300). Turns out ctdb uses fcntl() locking. Does
> cephfs support this kind of locking in the way ctdb expects it to?

We implement locking, and it's correct that another client can't gain
the lock until the first client is evicted.  Aside from speeding up
eviction by modifying the timeout, if you have another mechanism for
detecting node failure then you could use that to explicitly evict the
client.

John

> In the mean time we will try [7] (rados object) as a recovery lock.
> Would eliminate a layer / dependency as well.
>
> Thanks,
>
> Gr. Stefan
>
> [1]: https://ctdb.samba.org/
> [2]: https://ctdb.samba.org/manpages/ctdb_mutex_ceph_rados_helper.7.html
>
> --
> | BIT BV  http://www.bit.nl/        Kamer van Koophandel 09090351
> | GPG: 0xD14839C6                   +31 318 648 688 / info@xxxxxx
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com