Re: Ceph MDS WRN replayed op client.$id

John Spray <jspray@xxxxxxxxxx> · Wed, 19 Sep 2018 12:33:14 +0100



On Wed, Sep 19, 2018 at 10:37 AM Eugen Block <eblock@xxxxxx> wrote:
>
> Hi John,
>
> > I'm not 100% sure of that.  It could be that there's a path through
> > the code that's healthy, but just wasn't anticipated at the point that
> > warning message was added.  I wish a had a more unambiguous response
> > to give!
>
> then I guess we'll just keep ignoring these warnings from the replay
> mds until we hit a real issue. ;-)
>
> It's probably impossible to predict any improvement on this with mimic, right?

Yeah, since we haven't knowingly done anything about it, it would be a
(pleasant) surprise if it was accidentally resolved in mimic ;-)

John

> Regards,
> Eugen
>
>
> Zitat von John Spray <jspray@xxxxxxxxxx>:
>
> > On Mon, Sep 17, 2018 at 2:49 PM Eugen Block <eblock@xxxxxx> wrote:
> >>
> >> Hi,
> >>
> >> from your response I understand that these messages are not expected
> >> if everything is healthy.
> >
> > I'm not 100% sure of that.  It could be that there's a path through
> > the code that's healthy, but just wasn't anticipated at the point that
> > warning message was added.  I wish a had a more unambiguous response
> > to give!
> >
> > John
> >
> >> We face them every now and then, three or four times a week, but
> >> there's no real connection to specific jobs or a high load in our
> >> cluster. It's a Luminous cluster (12.2.7) with 1 active, 1
> >> standby-replay and 1 standby MDS.
> >> Since it's only the replay server reporting this and the failover
> >> works fine we didn't really bother. But what can we do to prevent this
> >> from happening? The messages appear quite randomly, so I don't really
> >> now when to increase the debug log level.
> >>
> >> Any hint would be highly appreciated!
> >>
> >> Regards,
> >> Eugen
> >>
> >>
> >> Zitat von John Spray <jspray@xxxxxxxxxx>:
> >>
> >> > On Thu, Sep 13, 2018 at 11:01 AM Stefan Kooman <stefan@xxxxxx> wrote:
> >> >>
> >> >> Hi John,
> >> >>
> >> >> Quoting John Spray (jspray@xxxxxxxxxx):
> >> >>
> >> >> > On Wed, Sep 12, 2018 at 2:59 PM Stefan Kooman <stefan@xxxxxx> wrote:
> >> >> >
> >> >> > When replaying a journal (either on MDS startup or on a standby-replay
> >> >> > MDS), the replayed file creation operations are being checked for
> >> >> > consistency with the state of the replayed client sessions.  Client
> >> >> > sessions have a "preallocated _inos" list that contains a set of inode
> >> >> > numbers they should be using to create new files.
> >> >> >
> >> >> > There are two checks being done: a soft check (just log it) that the
> >> >> > inode used for a new file is the same one that the session would be
> >> >> > expected to use for a new file, and a hard check (assertion) that the
> >> >> > inode used is one of the inode numbers that can be used for a new
> >> >> > file.  When that soft check fails, it doesn't indicate anything
> >> >> > inconsistent in the metadata, just that the inodes are being used in
> >> >> > an unexpected order.
> >> >> >
> >> >> > The WRN severity message mainly benefits our automated testing -- the
> >> >> > hope would be that if we're hitting strange scenarios like this in
> >> >> > automated tests then it would trigger a test failure (we by fail tests
> >> >> > if they emit unexpected warnings).
> >> >>
> >> >> Thanks for the explanation.
> >> >>
> >> >> > It would be interesting to know more about what's going on on your
> >> >> > cluster when this is happening -- do you have standby replay MDSs?
> >> >> > Multiple active MDSs?  Were any daemons failing over at a similar time
> >> >> > to the warnings?  Did you have anything funny going on with clients
> >> >> > (like forcing them to reconnect after being evicted)?
> >> >>
> >> >> Two MDSs in total. One active, one standby-replay. The clients are doing
> >> >> "funny" stuff. We are testing "CTDB" [1] in combination with cephfs to
> >> >> build a HA setup (to prevent split brain). We have two clients that, in
> >> >> case of a failure, need to require a lock on a file "ctdb_recovery_lock"
> >> >> before doing a recovery. Somehow, while configuring this setup, we
> >> >> triggered the "replayed op" warnings. We try to reproduce that, but no
> >> >> matter what we do the "replayed op" warnings do not occur anymore ...
> >> >>
> >> >> We have seen these warnings before (other clients). Warnings started
> >> >> after we had switched from mds1 -> mds2 (upgrade of Ceph cluster
> >> >> according to MDS upgrade procedure, reboots afterwards, hence the
> >> >> failover).
> >> >>
> >> >> Something I just realised is that _only_ the active-standby MDS
> >> >> is emitting the warnings, not the active MDS.
> >> >>
> >> >> Not related to the "replayed op" warning, but related to the CTDB "lock
> >> >> issue":
> >> >>
> >> >> The "surviving" cephfs client tries to acquire a lock on a file, but
> >> >> although the other client is dead (but not yet evicted by the MDS) it
> >> >> can't. Not until the dead client is evicted by the MDS after ~ 300 sec
> >> >> (mds_session_autoclose=300). Turns out ctdb uses fcntl() locking. Does
> >> >> cephfs support this kind of locking in the way ctdb expects it to?
> >> >
> >> > We implement locking, and it's correct that another client can't gain
> >> > the lock until the first client is evicted.  Aside from speeding up
> >> > eviction by modifying the timeout, if you have another mechanism for
> >> > detecting node failure then you could use that to explicitly evict the
> >> > client.
> >> >
> >> > John
> >> >
> >> >> In the mean time we will try [7] (rados object) as a recovery lock.
> >> >> Would eliminate a layer / dependency as well.
> >> >>
> >> >> Thanks,
> >> >>
> >> >> Gr. Stefan
> >> >>
> >> >> [1]: https://ctdb.samba.org/
> >> >> [2]: https://ctdb.samba.org/manpages/ctdb_mutex_ceph_rados_helper.7.html
> >> >>
> >> >> --
> >> >> | BIT BV  http://www.bit.nl/        Kamer van Koophandel 09090351
> >> >> | GPG: 0xD14839C6                   +31 318 648 688 / info@xxxxxx
> >> > _______________________________________________
> >> > ceph-users mailing list
> >> > ceph-users@xxxxxxxxxxxxxx
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com