On Wed, Sep 19, 2018 at 10:37 AM Eugen Block <eblock@xxxxxx> wrote: > > Hi John, > > > I'm not 100% sure of that. It could be that there's a path through > > the code that's healthy, but just wasn't anticipated at the point that > > warning message was added. I wish a had a more unambiguous response > > to give! > > then I guess we'll just keep ignoring these warnings from the replay > mds until we hit a real issue. ;-) > > It's probably impossible to predict any improvement on this with mimic, right? Yeah, since we haven't knowingly done anything about it, it would be a (pleasant) surprise if it was accidentally resolved in mimic ;-) John > Regards, > Eugen > > > Zitat von John Spray <jspray@xxxxxxxxxx>: > > > On Mon, Sep 17, 2018 at 2:49 PM Eugen Block <eblock@xxxxxx> wrote: > >> > >> Hi, > >> > >> from your response I understand that these messages are not expected > >> if everything is healthy. > > > > I'm not 100% sure of that. It could be that there's a path through > > the code that's healthy, but just wasn't anticipated at the point that > > warning message was added. I wish a had a more unambiguous response > > to give! > > > > John > > > >> We face them every now and then, three or four times a week, but > >> there's no real connection to specific jobs or a high load in our > >> cluster. It's a Luminous cluster (12.2.7) with 1 active, 1 > >> standby-replay and 1 standby MDS. > >> Since it's only the replay server reporting this and the failover > >> works fine we didn't really bother. But what can we do to prevent this > >> from happening? The messages appear quite randomly, so I don't really > >> now when to increase the debug log level. > >> > >> Any hint would be highly appreciated! > >> > >> Regards, > >> Eugen > >> > >> > >> Zitat von John Spray <jspray@xxxxxxxxxx>: > >> > >> > On Thu, Sep 13, 2018 at 11:01 AM Stefan Kooman <stefan@xxxxxx> wrote: > >> >> > >> >> Hi John, > >> >> > >> >> Quoting John Spray (jspray@xxxxxxxxxx): > >> >> > >> >> > On Wed, Sep 12, 2018 at 2:59 PM Stefan Kooman <stefan@xxxxxx> wrote: > >> >> > > >> >> > When replaying a journal (either on MDS startup or on a standby-replay > >> >> > MDS), the replayed file creation operations are being checked for > >> >> > consistency with the state of the replayed client sessions. Client > >> >> > sessions have a "preallocated _inos" list that contains a set of inode > >> >> > numbers they should be using to create new files. > >> >> > > >> >> > There are two checks being done: a soft check (just log it) that the > >> >> > inode used for a new file is the same one that the session would be > >> >> > expected to use for a new file, and a hard check (assertion) that the > >> >> > inode used is one of the inode numbers that can be used for a new > >> >> > file. When that soft check fails, it doesn't indicate anything > >> >> > inconsistent in the metadata, just that the inodes are being used in > >> >> > an unexpected order. > >> >> > > >> >> > The WRN severity message mainly benefits our automated testing -- the > >> >> > hope would be that if we're hitting strange scenarios like this in > >> >> > automated tests then it would trigger a test failure (we by fail tests > >> >> > if they emit unexpected warnings). > >> >> > >> >> Thanks for the explanation. > >> >> > >> >> > It would be interesting to know more about what's going on on your > >> >> > cluster when this is happening -- do you have standby replay MDSs? > >> >> > Multiple active MDSs? Were any daemons failing over at a similar time > >> >> > to the warnings? Did you have anything funny going on with clients > >> >> > (like forcing them to reconnect after being evicted)? > >> >> > >> >> Two MDSs in total. One active, one standby-replay. The clients are doing > >> >> "funny" stuff. We are testing "CTDB" [1] in combination with cephfs to > >> >> build a HA setup (to prevent split brain). We have two clients that, in > >> >> case of a failure, need to require a lock on a file "ctdb_recovery_lock" > >> >> before doing a recovery. Somehow, while configuring this setup, we > >> >> triggered the "replayed op" warnings. We try to reproduce that, but no > >> >> matter what we do the "replayed op" warnings do not occur anymore ... > >> >> > >> >> We have seen these warnings before (other clients). Warnings started > >> >> after we had switched from mds1 -> mds2 (upgrade of Ceph cluster > >> >> according to MDS upgrade procedure, reboots afterwards, hence the > >> >> failover). > >> >> > >> >> Something I just realised is that _only_ the active-standby MDS > >> >> is emitting the warnings, not the active MDS. > >> >> > >> >> Not related to the "replayed op" warning, but related to the CTDB "lock > >> >> issue": > >> >> > >> >> The "surviving" cephfs client tries to acquire a lock on a file, but > >> >> although the other client is dead (but not yet evicted by the MDS) it > >> >> can't. Not until the dead client is evicted by the MDS after ~ 300 sec > >> >> (mds_session_autoclose=300). Turns out ctdb uses fcntl() locking. Does > >> >> cephfs support this kind of locking in the way ctdb expects it to? > >> > > >> > We implement locking, and it's correct that another client can't gain > >> > the lock until the first client is evicted. Aside from speeding up > >> > eviction by modifying the timeout, if you have another mechanism for > >> > detecting node failure then you could use that to explicitly evict the > >> > client. > >> > > >> > John > >> > > >> >> In the mean time we will try [7] (rados object) as a recovery lock. > >> >> Would eliminate a layer / dependency as well. > >> >> > >> >> Thanks, > >> >> > >> >> Gr. Stefan > >> >> > >> >> [1]: https://ctdb.samba.org/ > >> >> [2]: https://ctdb.samba.org/manpages/ctdb_mutex_ceph_rados_helper.7.html > >> >> > >> >> -- > >> >> | BIT BV http://www.bit.nl/ Kamer van Koophandel 09090351 > >> >> | GPG: 0xD14839C6 +31 318 648 688 / info@xxxxxx > >> > _______________________________________________ > >> > ceph-users mailing list > >> > ceph-users@xxxxxxxxxxxxxx > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > >> > >> > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@xxxxxxxxxxxxxx > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com