Hi John, Quoting John Spray (jspray@xxxxxxxxxx): > On Wed, Sep 12, 2018 at 2:59 PM Stefan Kooman <stefan@xxxxxx> wrote: > > When replaying a journal (either on MDS startup or on a standby-replay > MDS), the replayed file creation operations are being checked for > consistency with the state of the replayed client sessions. Client > sessions have a "preallocated _inos" list that contains a set of inode > numbers they should be using to create new files. > > There are two checks being done: a soft check (just log it) that the > inode used for a new file is the same one that the session would be > expected to use for a new file, and a hard check (assertion) that the > inode used is one of the inode numbers that can be used for a new > file. When that soft check fails, it doesn't indicate anything > inconsistent in the metadata, just that the inodes are being used in > an unexpected order. > > The WRN severity message mainly benefits our automated testing -- the > hope would be that if we're hitting strange scenarios like this in > automated tests then it would trigger a test failure (we by fail tests > if they emit unexpected warnings). Thanks for the explanation. > It would be interesting to know more about what's going on on your > cluster when this is happening -- do you have standby replay MDSs? > Multiple active MDSs? Were any daemons failing over at a similar time > to the warnings? Did you have anything funny going on with clients > (like forcing them to reconnect after being evicted)? Two MDSs in total. One active, one standby-replay. The clients are doing "funny" stuff. We are testing "CTDB" [1] in combination with cephfs to build a HA setup (to prevent split brain). We have two clients that, in case of a failure, need to require a lock on a file "ctdb_recovery_lock" before doing a recovery. Somehow, while configuring this setup, we triggered the "replayed op" warnings. We try to reproduce that, but no matter what we do the "replayed op" warnings do not occur anymore ... We have seen these warnings before (other clients). Warnings started after we had switched from mds1 -> mds2 (upgrade of Ceph cluster according to MDS upgrade procedure, reboots afterwards, hence the failover). Something I just realised is that _only_ the active-standby MDS is emitting the warnings, not the active MDS. Not related to the "replayed op" warning, but related to the CTDB "lock issue": The "surviving" cephfs client tries to acquire a lock on a file, but although the other client is dead (but not yet evicted by the MDS) it can't. Not until the dead client is evicted by the MDS after ~ 300 sec (mds_session_autoclose=300). Turns out ctdb uses fcntl() locking. Does cephfs support this kind of locking in the way ctdb expects it to? In the mean time we will try [7] (rados object) as a recovery lock. Would eliminate a layer / dependency as well. Thanks, Gr. Stefan [1]: https://ctdb.samba.org/ [2]: https://ctdb.samba.org/manpages/ctdb_mutex_ceph_rados_helper.7.html -- | BIT BV http://www.bit.nl/ Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / info@xxxxxx _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com