Re: mds complains about "wrong node", stuck in replay

John Spray <jspray@xxxxxxxxxx> · Mon, 4 Jan 2016 10:36:41 +0000

On Wed, Dec 30, 2015 at 5:06 PM, Bryan Wright <bkw1a@xxxxxxxxxxxx> wrote:
> Hi folks,
>
> I have an mds cluster stuck in replay.  The mds log file is filled with
> errors like the following:
>
> 2015-12-30 12:00:25.912026 7f9f5b88b700  0 -- 192.168.1.31:6800/13093 >>
> 192.168.1.24:6823/31155 pipe(0x4ccc800 sd=18 :44201 s=1 pgs=0 cs=0 l=1
> c=0x4bb1e40).connect claims to be 192.168.1.24:6823/15059 not
> 192.168.1.24:6823/31155 - wrong node!
>
> Restarting all of the osds, mons, and mdss causes the error message
> to refer to a different osd.
>
> What's going on here?

What's the network between the MDS and the other daemons?  Messages
like that make me wonder if there is some NAT or other funky routing
going on.

John
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com