Re: Unable to mount with 18.2.2

"David C." <david.casier@xxxxxxxx> · Thu, 18 Jul 2024 14:01:40 +0200

Thank you for your research, Frédéric,

We looked and the conf files were up to date, in the form
[v1:(...),v2:(...)]

I manage to reproduce the "incident":

[aevoo-test - ceph-0]# ceph mon dump -f json|jq '.mons[].public_addrs'
dumped monmap epoch 2
{
  "addrvec": [
    {
      "type": "v2",
      "addr": "(IP):3300",
      "nonce": 0
    },
    {
      "type": "v1",
      "addr": "(IP):6789",
      "nonce": 0
    }
  ]
}
[aevoo-test - ceph-0]# ceph report |jq '.osdmap.osds[].public_addrs'
report 1958031869
{
  "addrvec": [
    {
      "type": "v2",
      "addr": "(IP):6800",
      "nonce": 116888
    },
    {
      "type": "v1",
      "addr": "(IP):6801",
      "nonce": 116888
    }
  ]
}
[aevoo-test - ceph-0]# ceph mon set-addrs home [v1:(IP):6789/0,v2:(IP):3300/0]
[aevoo-test - ceph-0]# ceph mon dump -f json|jq '.mons[].public_addrs'
dumped monmap epoch 3
{
  "addrvec": [
    {
      "type": "v1",
      "addr": "(IP):6789",
      "nonce": 0
    },
    {
      "type": "v2",
      "addr": "(IP):3300",
      "nonce": 0
    }
  ]
}

After osd restart :

[aevoo-test - ceph-0]# ceph report |jq '.osdmap.osds[].public_addrs'
report 2993464839
{
  "addrvec": [
    {
      "type": "v1",
      "addr": "(IP):6801",
      "nonce": 117895
    }
  ]
}

Le jeu. 18 juil. 2024 à 12:30, Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx>
a écrit :

> Hi Albert, David,
>
> I came across this: https://github.com/ceph/ceph/pull/47421
>
> "OSDs have a config file that includes addresses for the mon daemons.
> We already have in place logic to cause a reconfig of OSDs if the mon map
> changes, but when we do we aren't actually regenerating the config
> so it's never updated with the new mon addresses. This change is to
> have us recreate the OSD config when we redeploy or reconfig an OSD
> so it gets the new mon addresses."
>
> You mentioned a network change. Maybe the orch failed to update
> /var/lib/ceph/$(ceph fsid)/*/config of some services, came back later and
> succeeded.
>
> Maybe that explains it.
>
> Cheers,
> Frédéric.
>
> ----- Le 17 Juil 24, à 16:22, Frédéric Nass frederic.nass@xxxxxxxxxxxxxxxx
> a écrit :
>
> > ----- Le 17 Juil 24, à 15:53, Albert Shih Albert.Shih@xxxxxxxx a écrit :
> >
> >> Le 17/07/2024 à 09:40:59+0200, David C. a écrit
> >> Hi everyone.
> >>
> >>>
> >>> The curiosity of Albert's cluster is that (msgr) v1 and v2 are present
> on the
> >>> mons, as well as on the osds backend.
> >>>
> >>> But v2 is absent on the public OSD and MDS network
> >>>
> >>> The specific point is that the public network has been changed.
> >>>
> >>> At first, I thought it was the order of declaration of my_host (v1
> before v2)
> >>> but apparently, that's not it.
> >>>
> >>>
> >>> Le mer. 17 juil. 2024 à 09:21, Frédéric Nass <
> frederic.nass@xxxxxxxxxxxxxxxx> a
> >>> écrit :
> >>>
> >>>     Hi David,
> >>>
> >>>     Redeploying 2 out of 3 MONs a few weeks back (to have them using
> RocksDB to
> >>>     be ready for Quincy) prevented some clients from connecting to the
> cluster
> >>>     and mounting cephfs volumes.
> >>>
> >>>     Before the redeploy, these clients were using port 6789 (v1)
> explicitly as
> >>>     connections wouldn't work with port 3300 (v2).
> >>>     After the redeploy, removing port 6789 from mon_ips fixed the
> situation.
> >>>
> >>>     Seems like msgr v2 activation did only occur after all 3 MONs were
> >>>     redeployed and used RocksDB. Not sure why this happened though.
> >>>
> >>>     @Albert, if this cluster has been upgrade several times, you might
> want to
> >>>     check /var/lib/ceph/$(ceph fsid)/kv_backend, redeploy the MONS if
> leveldb,
> >>>     make sure all clients use the new mon_host syntax in ceph.conf
> ([v2:
> >>>     <cthulhu1_ip>:3300,v1:<cthulhu1_ip>:6789],etc.]) and check their
> ability to
> >>>     connect to port 3300.
> >>
> >> So it's working now, I can mount from all my clients the cephfs.
> >>
> >> Because I'm not sure what really happens and where was the issue here
> what
> >> have been done on the cluster (in that timeline) :
> >>
> >>  When I change the IP address of the server I make a maybe mistake and
> put
> >>
> >>    ceph mon set-addrs cthulhu1
> >>    [v1:cthulhu1_new_ip:6789/0,v2:cthulhu1_new_ip:3300/0]
> >>
> >>  Yesterday David change it to the right way
> >>
> >>    ceph mon set-addrs cthulhu1
> >>    [v2:cthulhu1_new_ip:3300/0,v1:cthulhu1_new_ip:6789/0]
> >>
> >>  but it's was not enough. Even after restarting all the OSD.
> >>
> >>  Then have been try some redeploying the mds --> no joy.
> >>
> >>  This morning I restart a osd and notice the restarted osd listen on v2
> >>  and v1, so I restart all osd.
> >>
> >>  After that, every osd listen on v2 and v1.
> >>
> >>  But still unable to mount the cephfs.
> >>
> >>  I try the option ms_mode=prefer-crc but nothing.
> >>
> >>  So I end up by rebooting the all cluster and now everything work fine.
> >>
> >> Thanks for your help.
> >
> > Great! Glad you figured it out.
> >
> > Frédéric.
> >
> >>
> >> Regards
> >> --
> >> Albert SHIH 🦫 🐸
> >> Observatoire de Paris
> >> France
> >> Heure locale/Local time:
> >> mer. 17 juil. 2024 15:42:00 CEST
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx