OSD_UNREACHABLE After Upgrade to 17.2.8 – Issue with Public Network Detection

Илья Безруков <rbetra@xxxxxxxxx> · Sat, 22 Mar 2025 22:17:31 +0300

Hello everyone,

After upgrading our Ceph cluster from 17.2.7 to 17.2.8 using `cephadm`, all
OSDs are reported as unreachable with the following error:

```
HEALTH_ERR 32 osds(s) are not reachable
[ERR] OSD_UNREACHABLE: 32 osds(s) are not reachable
    osd.0's public address is not in '172.20.180.1/32,172.20.180.0/24'
subnet
    osd.1's public address is not in '172.20.180.1/32,172.20.180.0/24'
subnet
    ...
    osd.31's public address is not in '172.20.180.1/32,172.20.180.0/24'
subnet
```

However, all OSDs actually have IP addresses within the `172.20.180.0/24`
subnet. The cluster remains functional (CephFS is accessible), and muting
the warning with `ceph health mute OSD_UNREACHABLE --sticky` allows normal
operation, but the underlying issue persists.

### **Environment Details**
- **Ceph Version:** 17.2.8 (f817ceb7f187defb1d021d6328fa833eb8e943b3)
Quincy (stable)
- **Container Image:** `
quay.io/ceph/ceph@sha256:a0f373aaaf5a5ca5c4379c09da24c771b8266a09dc9e2181f90eacf423d7326f`

- **OS:** CentOS Stream 9
- **Deployment Method:** cephadm

### **Network Configuration**
The network settings appear correct:
```
ceph config get osd public_network
172.20.180.0/24

ceph config get osd cluster_network
172.20.180.0/24

ceph config get mon public_network
172.20.180.0/24

ceph config get mgr public_network
172.20.180.0/24
```
OSD metadata confirms that the assigned addresses are in the correct
subnet:
```
"front_addr": "[v2:
172.20.180.126:6800/87977614,v1:172.20.180.126:6801/87977614]"
```
### **Current Cluster Status**
- All OSDs are **up** and **in**, and `ceph osd status` reports them as
operational.
- `ceph orch ps` confirms that OSD services are running.
- `ceph mon stat` shows all monitors in quorum.
- Cluster services (CephFS, RBD) are working as expected despite the
warning.

### **Questions**
1. Has anyone else encountered this issue after upgrading to 17.2.8?
2. Is this a known regression? (This seems similar to issue #67517.)
3. Would upgrading to Ceph 18.x (Reef) resolve the problem?
4. Is there any solution other than muting the health warning?

Any insights or recommendations would be greatly appreciated!

-- 
Best regards,
Ilya
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx