Re: 16.2.6: clients being incorrectly directed to the OSDs cluster_network address

David Caro <dcaro@xxxxxxxxxxxxx> · Tue, 28 Sep 2021 18:15:34 +0200

Just curious, does it always happen with the same OSDs?

On 09/28 16:14, Javier Cacheiro wrote:
> Interestingly enough this happens for some pools and not for others.
> 
> For example I have just realized that when trying to connect to another
> pool the client is correctly directed to the OSD public_network address:
> 
> >> strace -f -e trace=network -s 10000 rbd ls --pool cinder-volumes --name
> client.cinder 2>&1| grep sin_addr
> [pid 2363212] connect(15, {sa_family=AF_INET, sin_port=htons(6816),
> sin_addr=inet_addr("*10.113.29.7*")}, 16) = 0
> 
> But same client listing the ephemeral-vms pools is directed to the OSD
> cluster address:
> >> strace -f -e trace=network -s 10000 rbd ls --pool ephemeral-vms --name
> client.cinder 2>&1| grep sin_addr
> [pid 2363485] connect(14, {sa_family=AF_INET, sin_port=htons(6806),
> sin_addr=inet_addr("*10.114.29.10*")}, 16) = -1 EINPROGRESS (Operation now
> in progress)
> 
> Very weird!
> 
> 
> 
> On Tue, 28 Sept 2021 at 16:02, Javier Cacheiro <Javier.Cacheiro@xxxxxxxxx>
> wrote:
> 
> > Hi all,
> >
> > I am trying to understand a issue with ceph directing clients to connect
> > to OSDs through their cluster_network address instead of their
> > public_network address.
> >
> > I have a configured a ceph cluster with a public and cluster network:
> >
> > >> ceph config dump|grep network
> > global   advanced  cluster_network     *10.114.0.0/16
> > <http://10.114.0.0/16>*      *
> >   mon    advanced  public_network      10.113.0.0/16       *
> >
> > I upgraded the cluster from 16.2.4 to 16.2.6.
> >
> > After that, I am seeing that ceph is directing clients to connect to OSD's
> > cluster_network address instead of their public_address:
> >
> > >> strace -f -e trace=network -s 10000 rbd ls --pool ephemeral-vms --name
> > client.cinder
> > ....
> > [pid 2353692] connect(14, {sa_family=AF_INET, sin_port=htons(6806),
> > sin_addr=inet_addr("*10.114.29.10*")}, 16) = -1 EINPROGRESS (Operation
> > now in progress)
> >
> > In this case the client hangs because it is not able to access the
> > address, since its an internal address.
> >
> > This appeared after upgrading to 16.2.6, but I am not sure it was due to
> > the upgrade or it was a hidden issue that appeared after the nodes were
> > rebooted.
> >
> > It can also be that I am missing something in the config, but this config
> > was generated by the cephadm bootstrap command and not created by hand, and
> > it worked before the upgrade/reboot so I am pretty confident with it.
> >
> > What do you think, can this be a bug or is more a misconfiguration on my
> > side?
> >
> > Thanks,
> > Javier
> >
> >
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx