netstat -anp | grep LISTEN | grep mgr has it bound to 127.0.0.1 ? (also check the other daemons). If so this is another case of https://tracker.ceph.com/issues/49938 -- dan On Thu, Mar 25, 2021 at 8:34 PM Simon Oosthoek <s.oosthoek@xxxxxxxxxxxxx> wrote: > > Hi > > I'm in a bit of a panic :-( > > Recently we started attempting to configure a radosgw to our ceph > cluster, which was until now only doing cephfs (and rbd wss working as > well). We were messing about with ceph-ansible, as this was how we > originally installed the cluster. Anyway, it installed nautilus 14.2.18 > on the radosgw and I though it would be good to pull up the rest of the > cluster to that level as well using our tried and tested ceph upgrade > script (it basically does an update of all ceph nodes one by one and > checks whether ceph is ok again before doing the next) > > After the 3rd mon/mgr was done, all pg's were unavailable :-( > obviously, the script is not continuing, but ceph is also broken now... > > The message deceptively is: HEALTH_WARN Reduced data availability: 5568 > pgs inactive > > That's all PGs! > > I tried as a desperate measure to upgrade one ceph OSD node, but that > broke as well, the osd service on that node gets an interrupt from the > kernel.... > > the versions are now like: > 20:29 [root@cephmon1 ~]# ceph versions > { > "mon": { > "ceph version 14.2.18 > (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3 > }, > "mgr": { > "ceph version 14.2.18 > (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3 > }, > "osd": { > "ceph version 14.2.15 > (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 156 > }, > "mds": { > "ceph version 14.2.15 > (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 2 > }, > "overall": { > "ceph version 14.2.15 > (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 158, > "ceph version 14.2.18 > (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 6 > } > } > > > 12 OSDs are down > > # ceph -s > cluster: > id: b489547c-ba50-4745-a914-23eb78e0e5dc > health: HEALTH_WARN > Reduced data availability: 5568 pgs inactive > > services: > mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 50m) > mgr: cephmon1(active, since 53m), standbys: cephmon3, cephmon2 > mds: cephfs:1 {0=cephmds2=up:active} 1 up:standby > osd: 168 osds: 156 up (since 28m), 156 in (since 18m); 1722 > remapped pgs > > data: > pools: 12 pools, 5568 pgs > objects: 0 objects, 0 B > usage: 0 B used, 0 B / 0 B avail > pgs: 100.000% pgs unknown > 5568 unknown > > progress: > Rebalancing after osd.103 marked in > [..............................] > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx