Hi,
from your response I understand that these messages are not expected
if everything is healthy.
We face them every now and then, three or four times a week, but
there's no real connection to specific jobs or a high load in our
cluster. It's a Luminous cluster (12.2.7) with 1 active, 1
standby-replay and 1 standby MDS.
Since it's only the replay server reporting this and the failover
works fine we didn't really bother. But what can we do to prevent this
from happening? The messages appear quite randomly, so I don't really
now when to increase the debug log level.
Any hint would be highly appreciated!
Regards,
Eugen
Zitat von John Spray <jspray@xxxxxxxxxx>:
On Thu, Sep 13, 2018 at 11:01 AM Stefan Kooman <stefan@xxxxxx> wrote:
Hi John,
Quoting John Spray (jspray@xxxxxxxxxx):
> On Wed, Sep 12, 2018 at 2:59 PM Stefan Kooman <stefan@xxxxxx> wrote:
>
> When replaying a journal (either on MDS startup or on a standby-replay
> MDS), the replayed file creation operations are being checked for
> consistency with the state of the replayed client sessions. Client
> sessions have a "preallocated _inos" list that contains a set of inode
> numbers they should be using to create new files.
>
> There are two checks being done: a soft check (just log it) that the
> inode used for a new file is the same one that the session would be
> expected to use for a new file, and a hard check (assertion) that the
> inode used is one of the inode numbers that can be used for a new
> file. When that soft check fails, it doesn't indicate anything
> inconsistent in the metadata, just that the inodes are being used in
> an unexpected order.
>
> The WRN severity message mainly benefits our automated testing -- the
> hope would be that if we're hitting strange scenarios like this in
> automated tests then it would trigger a test failure (we by fail tests
> if they emit unexpected warnings).
Thanks for the explanation.
> It would be interesting to know more about what's going on on your
> cluster when this is happening -- do you have standby replay MDSs?
> Multiple active MDSs? Were any daemons failing over at a similar time
> to the warnings? Did you have anything funny going on with clients
> (like forcing them to reconnect after being evicted)?
Two MDSs in total. One active, one standby-replay. The clients are doing
"funny" stuff. We are testing "CTDB" [1] in combination with cephfs to
build a HA setup (to prevent split brain). We have two clients that, in
case of a failure, need to require a lock on a file "ctdb_recovery_lock"
before doing a recovery. Somehow, while configuring this setup, we
triggered the "replayed op" warnings. We try to reproduce that, but no
matter what we do the "replayed op" warnings do not occur anymore ...
We have seen these warnings before (other clients). Warnings started
after we had switched from mds1 -> mds2 (upgrade of Ceph cluster
according to MDS upgrade procedure, reboots afterwards, hence the
failover).
Something I just realised is that _only_ the active-standby MDS
is emitting the warnings, not the active MDS.
Not related to the "replayed op" warning, but related to the CTDB "lock
issue":
The "surviving" cephfs client tries to acquire a lock on a file, but
although the other client is dead (but not yet evicted by the MDS) it
can't. Not until the dead client is evicted by the MDS after ~ 300 sec
(mds_session_autoclose=300). Turns out ctdb uses fcntl() locking. Does
cephfs support this kind of locking in the way ctdb expects it to?
We implement locking, and it's correct that another client can't gain
the lock until the first client is evicted. Aside from speeding up
eviction by modifying the timeout, if you have another mechanism for
detecting node failure then you could use that to explicitly evict the
client.
John
In the mean time we will try [7] (rados object) as a recovery lock.
Would eliminate a layer / dependency as well.
Thanks,
Gr. Stefan
[1]: https://ctdb.samba.org/
[2]: https://ctdb.samba.org/manpages/ctdb_mutex_ceph_rados_helper.7.html
--
| BIT BV http://www.bit.nl/ Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / info@xxxxxx
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com