Re: 19.2.1: HEALTH_ERR 27 osds(s) are not reachable. (Yet working normally...)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Frédéric,

Another half year added to the previous half year wait for basic IP6 clusters then.   If only 'ceph health mute' accomplished the goal as a workaround.  Notice even when all complaints are 'suppressed' -- the dashboard continues to offer the 'flashing red warning dot', and the ! Cluster critical advice.

I think that bug has two levels,  first: even when other warnings/errors are suppressed, the error that complains of being in a heath error for more than 5 minutes remains.   Second, even when the 'things have been bad for 5 minutes' warning is also silenced, the ! Critical advice remains and the flashing red 'ceph is broken' dot.  This while under 'observability' the Alerts shows all is well.

Ceph is good in the engine room, but the steering wheel and dashboard needs some work to match the advertising and quality of the rest!

Harry


On 2/7/25 16:24, Frédéric Nass wrote:
Hi Harry,

It's a inoffensive bug [1] related to IPv6 clusters. It will be fixed in v19.2.2. The workaround is to mute the error with 'ceph health mute ...'. It's all you can do for now.

Regards,
.

------------------------------------------------------------------------
*De :* Harry G Coin <hgcoin@xxxxxxxxx>
*Envoyé :* vendredi 7 février 2025 22:52
*À :* ceph-users
*Objet :* 19.2.1: HEALTH_ERR 27 osds(s) are not reachable. (Yet working normally...)

19.2.1 complains of all osd's being unreachable, as their public address
isn't in the public subnet.  However, they all are within the subnet,
and are working normally as well.

It's embarrassing for the dashboard to glow red of a totally crippled
osd roster --- while all is working normally.  This existed in the
previous, but was working prior to 19.

Detail:

Notice, for osd.0, the dashboard lists

public_addr
[fc00:1002:c7::44]:6807/4160993080

But, we have in the logs:

7/2/25 03:35 PM[ERR] osd.0's public address is not in
'fc00:1002:c7::/64' subnet

7/2/25 03:35 PM[ERR][ERR] OSD_UNREACHABLE: 27 osds(s) are not reachable

7/2/25 03:35 PM[ERR]Health detail: HEALTH_ERR 27 osds(s) are not reachable

However, as per the osd.0 attributes, the public address for osd.0 is
well inside the stated public subnet.

All the osd's are similarly configured, working, and held to be
unreachable at the same time, for the same reason.

Tell me there's a way to fix this without waiting a further half year....

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux