On Tue, Mar 2, 2021 at 6:02 PM Stefan Kooman <stefan@xxxxxx> wrote: > > On 3/2/21 5:42 PM, Ilya Dryomov wrote: > > On Tue, Mar 2, 2021 at 9:26 AM Stefan Kooman <stefan@xxxxxx> wrote: > >> > >> Hi, > >> > >> On a CentOS 7 VM with mainline kernel (5.11.2-1.el7.elrepo.x86_64 #1 SMP > >> Fri Feb 26 11:54:18 EST 2021 x86_64 x86_64 x86_64 GNU/Linux) and with > >> Ceph Octopus 15.2.9 packages installed. The MDS server is running > >> Nautilus 14.2.16. Messenger v2 has been enabled. Poort 3300 of the > >> monitors is reachable from the client. At mount time we get the following: > >> > >>> Mar 2 09:01:14 kernel: Key type ceph registered > >>> Mar 2 09:01:14 kernel: libceph: loaded (mon/osd proto 15/24) > >>> Mar 2 09:01:14 kernel: FS-Cache: Netfs 'ceph' registered for caching > >>> Mar 2 09:01:14 kernel: ceph: loaded (mds proto 32) > >>> Mar 2 09:01:14 kernel: libceph: mon4 (1)[mond addr]:6789 session established > >>> Mar 2 09:01:14 kernel: libceph: another match of type 1 in addrvec > >>> Mar 2 09:01:14 kernel: ceph: corrupt mdsmap > >>> Mar 2 09:01:14 kernel: ceph: error decoding mdsmap -22 > >>> Mar 2 09:01:14 kernel: libceph: another match of type 1 in addrvec > >>> Mar 2 09:01:14 kernel: libceph: corrupt full osdmap (-22) epoch 98764 off 6357 (0000000027a57a75 of 00000000d3075952-00000000e307797f) > >>> Mar 2 09:02:15 kernel: ceph: No mds server is up or the cluster is laggy > >> > >> The /etc/ceph/ceph.conf has been adjusted to reflect the messenger v2 > >> changes. ms_bind_ipv6=trie, ms_bind_ipv4=false. The kernel client still > >> seems to be use the v1 port though (although since 5.11 v2 should be > >> supported). > >> > >> Has anyone seen this before? Just guessing here, but could it that the > >> client tries to speak v2 protocol on v1 port? > > > > Hi Stefan, > > > > Those "another match of type 1" errors suggest that you have two > > different v1 addresses for some of or all OSDs and MDSes in osdmap > > and mdsmap respectively. > > > > What is the output of "ceph osd dump" and "ceph fs dump"? > > That's a lot of output, so I trimmed it: > > --- snip --- > osd.0 up in weight 1 up_from 98071 up_thru 98719 down_at 98068 > last_clean_interval [96047,98067) > [v2:[2001:7b8:80:1:0:1:2:1]:6848/505534,v1:[2001:7b8:80:1:0:1:2:1]:6854/505534,v2:0.0.0.0:6860/505534,v1:0.0.0.0:6866/505534] Where did "v2:0.0.0.0:6860/505534,v1:0.0.0.0:6866/505534" come from? This is what confuses the kernel client: it sees two addresses of the same type and doesn't know which one to pick. Instead of blindly picking the first one (or some other dubious heuristic) it just denies the osdmap. > [mds.mds1{0:229930080} state up:active seq 144042 addr > [v2:[2001:7b8:80:1:0:1:3:1]:6800/2234186180,v1:[2001:7b8:80:1:0:1:3:1]:6801/2234186180,v2:0.0.0.0:6802/2234186180,v1:0.0.0.0:6803/2234186180]] Same for the mdsmap. Were you using ipv6 with the kernel client before upgrading to 5.11? What is output of "ceph daemon osd.0 config get ms_bind_ipv4" on the osd0 node? Thanks, Ilya _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx