Re: cephfs: unable to mount share with 5.11 mainline, ceph 15.2.9, MDS 14.1.16

Ilya Dryomov <idryomov@xxxxxxxxx> · Tue, 2 Mar 2021 18:54:28 +0100

On Tue, Mar 2, 2021 at 6:02 PM Stefan Kooman <stefan@xxxxxx> wrote:
>
> On 3/2/21 5:42 PM, Ilya Dryomov wrote:
> > On Tue, Mar 2, 2021 at 9:26 AM Stefan Kooman <stefan@xxxxxx> wrote:
> >>
> >> Hi,
> >>
> >> On a CentOS 7 VM with mainline kernel (5.11.2-1.el7.elrepo.x86_64 #1 SMP
> >> Fri Feb 26 11:54:18 EST 2021 x86_64 x86_64 x86_64 GNU/Linux) and with
> >> Ceph Octopus 15.2.9 packages installed. The MDS server is running
> >> Nautilus 14.2.16. Messenger v2 has been enabled. Poort 3300 of the
> >> monitors is reachable from the client. At mount time we get the following:
> >>
> >>> Mar  2 09:01:14  kernel: Key type ceph registered
> >>> Mar  2 09:01:14  kernel: libceph: loaded (mon/osd proto 15/24)
> >>> Mar  2 09:01:14  kernel: FS-Cache: Netfs 'ceph' registered for caching
> >>> Mar  2 09:01:14  kernel: ceph: loaded (mds proto 32)
> >>> Mar  2 09:01:14  kernel: libceph: mon4 (1)[mond addr]:6789 session established
> >>> Mar  2 09:01:14  kernel: libceph: another match of type 1 in addrvec
> >>> Mar  2 09:01:14  kernel: ceph: corrupt mdsmap
> >>> Mar  2 09:01:14  kernel: ceph: error decoding mdsmap -22
> >>> Mar  2 09:01:14  kernel: libceph: another match of type 1 in addrvec
> >>> Mar  2 09:01:14  kernel: libceph: corrupt full osdmap (-22) epoch 98764 off 6357 (0000000027a57a75 of 00000000d3075952-00000000e307797f)
> >>> Mar  2 09:02:15  kernel: ceph: No mds server is up or the cluster is laggy
> >>
> >> The /etc/ceph/ceph.conf has been adjusted to reflect the messenger v2
> >> changes. ms_bind_ipv6=trie, ms_bind_ipv4=false. The kernel client still
> >> seems to be use the v1 port though (although since 5.11 v2 should be
> >> supported).
> >>
> >> Has anyone seen this before? Just guessing here, but could it that the
> >> client tries to speak v2 protocol on v1 port?
> >
> > Hi Stefan,
> >
> > Those "another match of type 1" errors suggest that you have two
> > different v1 addresses for some of or all OSDs and MDSes in osdmap
> > and mdsmap respectively.
> >
> > What is the output of "ceph osd dump" and "ceph fs dump"?
>
> That's a lot of output, so I trimmed it:
>
> --- snip ---
> osd.0 up   in  weight 1 up_from 98071 up_thru 98719 down_at 98068
> last_clean_interval [96047,98067)
> [v2:[2001:7b8:80:1:0:1:2:1]:6848/505534,v1:[2001:7b8:80:1:0:1:2:1]:6854/505534,v2:0.0.0.0:6860/505534,v1:0.0.0.0:6866/505534]

Where did "v2:0.0.0.0:6860/505534,v1:0.0.0.0:6866/505534" come from?
This is what confuses the kernel client: it sees two addresses of
the same type and doesn't know which one to pick.  Instead of blindly
picking the first one (or some other dubious heuristic) it just denies
the osdmap.

> [mds.mds1{0:229930080} state up:active seq 144042 addr
> [v2:[2001:7b8:80:1:0:1:3:1]:6800/2234186180,v1:[2001:7b8:80:1:0:1:3:1]:6801/2234186180,v2:0.0.0.0:6802/2234186180,v1:0.0.0.0:6803/2234186180]]

Same for the mdsmap.

Were you using ipv6 with the kernel client before upgrading to 5.11?

What is output of "ceph daemon osd.0 config get ms_bind_ipv4" on the
osd0 node?

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx