Thank you for your research, Frédéric, We looked and the conf files were up to date, in the form [v1:(...),v2:(...)] I manage to reproduce the "incident": [aevoo-test - ceph-0]# ceph mon dump -f json|jq '.mons[].public_addrs' dumped monmap epoch 2 { "addrvec": [ { "type": "v2", "addr": "(IP):3300", "nonce": 0 }, { "type": "v1", "addr": "(IP):6789", "nonce": 0 } ] } [aevoo-test - ceph-0]# ceph report |jq '.osdmap.osds[].public_addrs' report 1958031869 { "addrvec": [ { "type": "v2", "addr": "(IP):6800", "nonce": 116888 }, { "type": "v1", "addr": "(IP):6801", "nonce": 116888 } ] } [aevoo-test - ceph-0]# ceph mon set-addrs home [v1:(IP):6789/0,v2:(IP):3300/0] [aevoo-test - ceph-0]# ceph mon dump -f json|jq '.mons[].public_addrs' dumped monmap epoch 3 { "addrvec": [ { "type": "v1", "addr": "(IP):6789", "nonce": 0 }, { "type": "v2", "addr": "(IP):3300", "nonce": 0 } ] } After osd restart : [aevoo-test - ceph-0]# ceph report |jq '.osdmap.osds[].public_addrs' report 2993464839 { "addrvec": [ { "type": "v1", "addr": "(IP):6801", "nonce": 117895 } ] } Le jeu. 18 juil. 2024 à 12:30, Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx> a écrit : > Hi Albert, David, > > I came across this: https://github.com/ceph/ceph/pull/47421 > > "OSDs have a config file that includes addresses for the mon daemons. > We already have in place logic to cause a reconfig of OSDs if the mon map > changes, but when we do we aren't actually regenerating the config > so it's never updated with the new mon addresses. This change is to > have us recreate the OSD config when we redeploy or reconfig an OSD > so it gets the new mon addresses." > > You mentioned a network change. Maybe the orch failed to update > /var/lib/ceph/$(ceph fsid)/*/config of some services, came back later and > succeeded. > > Maybe that explains it. > > Cheers, > Frédéric. > > ----- Le 17 Juil 24, à 16:22, Frédéric Nass frederic.nass@xxxxxxxxxxxxxxxx > a écrit : > > > ----- Le 17 Juil 24, à 15:53, Albert Shih Albert.Shih@xxxxxxxx a écrit : > > > >> Le 17/07/2024 à 09:40:59+0200, David C. a écrit > >> Hi everyone. > >> > >>> > >>> The curiosity of Albert's cluster is that (msgr) v1 and v2 are present > on the > >>> mons, as well as on the osds backend. > >>> > >>> But v2 is absent on the public OSD and MDS network > >>> > >>> The specific point is that the public network has been changed. > >>> > >>> At first, I thought it was the order of declaration of my_host (v1 > before v2) > >>> but apparently, that's not it. > >>> > >>> > >>> Le mer. 17 juil. 2024 à 09:21, Frédéric Nass < > frederic.nass@xxxxxxxxxxxxxxxx> a > >>> écrit : > >>> > >>> Hi David, > >>> > >>> Redeploying 2 out of 3 MONs a few weeks back (to have them using > RocksDB to > >>> be ready for Quincy) prevented some clients from connecting to the > cluster > >>> and mounting cephfs volumes. > >>> > >>> Before the redeploy, these clients were using port 6789 (v1) > explicitly as > >>> connections wouldn't work with port 3300 (v2). > >>> After the redeploy, removing port 6789 from mon_ips fixed the > situation. > >>> > >>> Seems like msgr v2 activation did only occur after all 3 MONs were > >>> redeployed and used RocksDB. Not sure why this happened though. > >>> > >>> @Albert, if this cluster has been upgrade several times, you might > want to > >>> check /var/lib/ceph/$(ceph fsid)/kv_backend, redeploy the MONS if > leveldb, > >>> make sure all clients use the new mon_host syntax in ceph.conf > ([v2: > >>> <cthulhu1_ip>:3300,v1:<cthulhu1_ip>:6789],etc.]) and check their > ability to > >>> connect to port 3300. > >> > >> So it's working now, I can mount from all my clients the cephfs. > >> > >> Because I'm not sure what really happens and where was the issue here > what > >> have been done on the cluster (in that timeline) : > >> > >> When I change the IP address of the server I make a maybe mistake and > put > >> > >> ceph mon set-addrs cthulhu1 > >> [v1:cthulhu1_new_ip:6789/0,v2:cthulhu1_new_ip:3300/0] > >> > >> Yesterday David change it to the right way > >> > >> ceph mon set-addrs cthulhu1 > >> [v2:cthulhu1_new_ip:3300/0,v1:cthulhu1_new_ip:6789/0] > >> > >> but it's was not enough. Even after restarting all the OSD. > >> > >> Then have been try some redeploying the mds --> no joy. > >> > >> This morning I restart a osd and notice the restarted osd listen on v2 > >> and v1, so I restart all osd. > >> > >> After that, every osd listen on v2 and v1. > >> > >> But still unable to mount the cephfs. > >> > >> I try the option ms_mode=prefer-crc but nothing. > >> > >> So I end up by rebooting the all cluster and now everything work fine. > >> > >> Thanks for your help. > > > > Great! Glad you figured it out. > > > > Frédéric. > > > >> > >> Regards > >> -- > >> Albert SHIH 🦫 🐸 > >> Observatoire de Paris > >> France > >> Heure locale/Local time: > >> mer. 17 juil. 2024 15:42:00 CEST > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx