Hi Frédéric
All was normal in v18, after 19.2 the problem remains even though the
addresses are different:
cluster_network global: fc00:1000:0:b00::/64
public_network global: fc00:1002:c7::/64
Also, after rebooting everything in sequence, it only complains that the
27 osd that are both up, in and working normally remain also "not
reachable".
~# ceph -s
cluster:
id: ...
health: HEALTH_ERR
27 osds(s) are not reachable
services:
...
osd: 27 osds: 27 up (since 6m), 27 in (since 12d)
On 10/16/24 03:44, Frédéric Nass wrote:
Hi Harry,
Do you have a 'cluster_network' set to the same subnet as the 'public_network' like in the issue [1]? Doesn't make much sens setting up a cluster_network when it's not different than the public_network.
Maybe that's what triggers the OSD_UNREACHABLE recently coded here [2] (even though it seems the code only considers IPv4 addresses, which seems odd, btw.)
I suggest removing the cluster_network and restart a single OSD to see if the counter decreases.
Regards,
Frédéric.
[1]https://tracker.ceph.com/issues/67517
[2]https://github.com/ceph/ceph/commit/5b70a6b92079f9e9d5d899eceebc1a62dae72997
----- Le 16 Oct 24, à 3:02, Harry G Coinhgcoin@xxxxxxxxx a écrit :
Thanks for the notion! I did that, the result was no change to the
problem, but with the added ceph -s complaint "Public/cluster network
defined, but can not be found on any host" -- with otherwise totally
normal cluster operations. Go figure. How can ceph -s be so totally
wrong, the dashboard reporting critical problems -- except there are
none. Makes me really wonder whether any actual testing on ipv6 is
ever done before releases are marked 'stable'.
HC
On 10/14/24 21:04, Anthony D'Atri wrote:
Try failing over to a standby mgr
On Oct 14, 2024, at 9:33 PM, Harry G Coin<hgcoin@xxxxxxxxx> wrote:
I need help to remove a useless "HEALTH ERR" in 19.2.0 on a fully dual stack
docker setup with ceph using ip v6, public and private nets separated, with a
few servers. After upgrading from an error free v18 rev, I can't get rid of
the 'health err' owing to the report that all osds are unreachable. Meanwhile
ceph -s reports all osds up and in and the cluster otherwise operates normally.
I don't care if it's 'a real fix' I just need to remove the false error
report. Any ideas?
Thanks
Harry Coin
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an emailtoceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx