upgrade problem nautilus 14.2.15 -> 14.2.18? (Broken ceph!)

Simon Oosthoek <s.oosthoek@xxxxxxxxxxxxx> · Thu, 25 Mar 2021 20:33:40 +0100

Hi

I'm in a bit of a panic :-(

Recently we started attempting to configure a radosgw to our ceph 
cluster, which was until now only doing cephfs (and rbd wss working as 
well). We were messing about with ceph-ansible, as this was how we 
originally installed the cluster. Anyway, it installed nautilus 14.2.18 
on the radosgw and I though it would be good to pull up the rest of the 
cluster to that level as well using our tried and tested ceph upgrade 
script (it basically does an update of all ceph nodes one by one and 
checks whether ceph is ok again before doing the next)

After the 3rd mon/mgr was done, all pg's were unavailable :-(
obviously, the script is not continuing, but ceph is also broken now...

The message deceptively is: HEALTH_WARN Reduced data availability: 5568 
pgs inactive

That's all PGs!

I tried as a desperate measure to upgrade one ceph OSD node, but that 
broke as well, the osd service on that node gets an interrupt from the 
kernel....

the versions are now like:
20:29 [root@cephmon1 ~]# ceph versions
{
    "mon": {
        "ceph version 14.2.18 
(befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3
    },
    "mgr": {
        "ceph version 14.2.18 
(befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3
    },
    "osd": {
        "ceph version 14.2.15 
(afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 156
    },
    "mds": {
        "ceph version 14.2.15 
(afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 2
    },
    "overall": {
        "ceph version 14.2.15 
(afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 158,
        "ceph version 14.2.18 
(befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 6
    }
}

12 OSDs are down

# ceph -s
  cluster:
    id:     b489547c-ba50-4745-a914-23eb78e0e5dc
    health: HEALTH_WARN
            Reduced data availability: 5568 pgs inactive

  services:
    mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 50m)
    mgr: cephmon1(active, since 53m), standbys: cephmon3, cephmon2
    mds: cephfs:1 {0=cephmds2=up:active} 1 up:standby
    osd: 168 osds: 156 up (since 28m), 156 in (since 18m); 1722 
remapped pgs

  data:
    pools:   12 pools, 5568 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:     100.000% pgs unknown
             5568 unknown

  progress:
    Rebalancing after osd.103 marked in
      [..............................]

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx