Re: post-mortem of a ceph disruption

Stefan Kooman <stefan@xxxxxx> · Wed, 26 Oct 2022 10:57:28 +0200

On 10/25/22 17:08, Simon Oosthoek wrote:

At this point, one of noticed that a strange ip adress was mentioned; 
169.254.0.2, it turns out that a recently added package (openmanage) and 
some configuration had added this interface and address to hardware 
nodes from Dell. For us, our single interface assumption is now out the 
window and 0.0.0.0/0 is a bad idea in /etc/ceph/ceph.conf for public and 
cluster network (though it's the same network for us).

Our 3 datacenters are on three different subnets so it becomes a bit 
difficult to make it more specific. The nodes are all under the same 
/16, so we can choose that, but it is starting to look like a weird 
network setup.
I've always thought that this configuration was kind of non-intuitive 
and I still do. And now it has bitten us :-(

Thanks for reading and if you have any suggestions on how to fix/prevent 
this kind of error, we'll be glad to hear it!

We don't have the public_network specified in our cluster(s). AFAIK It's 
not needed (anymore). There is no default network address range 
configured. So I would just get rid of it. Same for cluster_network if 
you have that configured. There I fixed it! ;-).

If you don't use IPv6, I would explicitly turn it off:

ms_bind_ipv6 = false

The Ceph daemons and clients need to know what monitors there are and 
what their address is: that is important (mon_host). IPs of OSDs are 
available in osd map, IPs of MDSs in mds map, etc. and the clients will 
request that when needed from the monitors.

If you want to hardcode each ip a daemon has to listen on, it is 
possible. You can create daemon specific entries on what IP they have to 
bind to (public_bind_addr IIRC).

Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx