Re: 19.2.1: HEALTH_ERR 27 osds(s) are not reachable. (Yet working normally...)

Harry G Coin <hgcoin@xxxxxxxxx> · Mon, 10 Feb 2025 11:17:17 -0600

Hi Frédéric,

Another half year added to the previous half year wait for basic IP6 
clusters then.   If only 'ceph health mute' accomplished the goal as a 
workaround.  Notice even when all complaints are 'suppressed' -- the 
dashboard continues to offer the 'flashing red warning dot', and the ! 
Cluster critical advice.

I think that bug has two levels,  first: even when other warnings/errors 
are suppressed, the error that complains of being in a heath error for 
more than 5 minutes remains.   Second, even when the 'things have been 
bad for 5 minutes' warning is also silenced, the ! Critical advice 
remains and the flashing red 'ceph is broken' dot.  This while under 
'observability' the Alerts shows all is well.

Ceph is good in the engine room, but the steering wheel and dashboard 
needs some work to match the advertising and quality of the rest!

Harry

On 2/7/25 16:24, Frédéric Nass wrote:
Hi Harry,

It's a inoffensive bug [1] related to IPv6 clusters. It will be fixed 
in v19.2.2. The workaround is to mute the error with 'ceph health mute 
...'. It's all you can do for now.

Regards,
.

------------------------------------------------------------------------
*De :* Harry G Coin <hgcoin@xxxxxxxxx>
*Envoyé :* vendredi 7 février 2025 22:52
*À :* ceph-users
*Objet :*  19.2.1: HEALTH_ERR 27 osds(s) are not 
reachable. (Yet working normally...)

19.2.1 complains of all osd's being unreachable, as their public address
isn't in the public subnet.  However, they all are within the subnet,
and are working normally as well.

It's embarrassing for the dashboard to glow red of a totally crippled
osd roster --- while all is working normally.  This existed in the
previous, but was working prior to 19.

Detail:

Notice, for osd.0, the dashboard lists

public_addr
[fc00:1002:c7::44]:6807/4160993080

But, we have in the logs:

7/2/25 03:35 PM[ERR] osd.0's public address is not in
'fc00:1002:c7::/64' subnet

7/2/25 03:35 PM[ERR][ERR] OSD_UNREACHABLE: 27 osds(s) are not reachable

7/2/25 03:35 PM[ERR]Health detail: HEALTH_ERR 27 osds(s) are not 
reachable

However, as per the osd.0 attributes, the public address for osd.0 is
well inside the stated public subnet.

All the osd's are similarly configured, working, and held to be
unreachable at the same time, for the same reason.

Tell me there's a way to fix this without waiting a further half year....

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx