Re: Ceph MDS WRN replayed op client.$id

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 12, 2018 at 2:59 PM Stefan Kooman <stefan@xxxxxx> wrote:
>
> Hi,
>
> Once in a while, today a bit more often, the MDS is logging the
> following:
>
> mds.mds1 [WRN]  replayed op client.15327973:15585315,15585103 used ino
> 0x100009918de but session next is 0x10000873b8b
>
> Nothing of importance is logged in the mds (debug_mds_log": "1/5").
>
> What does this warning message mean / indicate?

When replaying a journal (either on MDS startup or on a standby-replay
MDS), the replayed file creation operations are being checked for
consistency with the state of the replayed client sessions.  Client
sessions have a "preallocated _inos" list that contains a set of inode
numbers they should be using to create new files.

There are two checks being done: a soft check (just log it) that the
inode used for a new file is the same one that the session would be
expected to use for a new file, and a hard check (assertion) that the
inode used is one of the inode numbers that can be used for a new
file.  When that soft check fails, it doesn't indicate anything
inconsistent in the metadata, just that the inodes are being used in
an unexpected order.

The WRN severity message mainly benefits our automated testing -- the
hope would be that if we're hitting strange scenarios like this in
automated tests then it would trigger a test failure (we by fail tests
if they emit unexpected warnings).

It would be interesting to know more about what's going on on your
cluster when this is happening -- do you have standby replay MDSs?
Multiple active MDSs?  Were any daemons failing over at a similar time
to the warnings?  Did you have anything funny going on with clients
(like forcing them to reconnect after being evicted)?

John

> At some point this client (ceph-fuse, mimic 13.2.1) triggers the following:
>
> mon.mon1 [WRN] Health check failed: 1 MDSs report slow requests
> (MDS_SLOW_REQUEST)
> mds.mds2 [WRN] 1 slow requests, 1 included below; oldest blocked for >
> 30.911624 secs
> mds.mds2 [WRN] slow request 30.911624 seconds old, received at
> 2018-09-12 15:18:44.739321: client_request(client.15732335:9506 lookup
> #0x100006901a7/ctdb_recovery_lock caller_uid=0, caller_gid=0{}) currently failed to
> rdlock, waiting
>
> mds logging:
>
> 2018-09-12 11:35:07.373091 7f80af91e700  0 -- [2001:7b8:80:3:0:2c:3:2]:6800/1086374448 >> [2001:7b8:81:7::11]:0/2366241118 conn(0x56332404f000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg: challenging authorizer
> 2018-09-12 13:24:17.000787 7f80af91e700  0 -- [2001:7b8:80:3:0:2c:3:2]:6800/1086374448 >> [2001:7b8:81:7::11]:0/526035198 conn(0x56330c726000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg: challenging authorizer
> 2018-09-12 15:21:17.176405 7f80af91e700  0 -- [2001:7b8:80:3:0:2c:3:2]:6800/1086374448 >> [2001:7b8:81:7::11]:0/526035198 conn(0x56330c726000 :6800 s=STATE_OPEN pgs=3 cs=1 l=0).fault server, going to standby
> 2018-09-12 15:22:26.641501 7f80af91e700  0 -- [2001:7b8:80:3:0:2c:3:2]:6800/1086374448 >> [2001:7b8:81:7::11]:0/526035198 conn(0x5633678f7000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg: challenging authorizer
> 2018-09-12 15:22:26.641694 7f80af91e700  0 -- [2001:7b8:80:3:0:2c:3:2]:6800/1086374448 >> [2001:7b8:81:7::11]:0/526035198 conn(0x5633678f7000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 2 vs existing csq=1 existing_state=STATE_STANDBY
> 2018-09-12 15:22:26.641971 7f80af91e700  0 -- [2001:7b8:80:3:0:2c:3:2]:6800/1086374448 >> [2001:7b8:81:7::11]:0/526035198 conn(0x56330c726000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=3 cs=1 l=0).handle_connect_msg: challenging authorizer
>
> Thanks,
>
> Stefan
>
>
> --
> | BIT BV  http://www.bit.nl/        Kamer van Koophandel 09090351
> | GPG: 0xD14839C6                   +31 318 648 688 / info@xxxxxx
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux