Re: upgrade problem nautilus 14.2.15 -> 14.2.18? (Broken ceph!)

Simon Oosthoek <s.oosthoek@xxxxxxxxxxxxx> · Thu, 25 Mar 2021 20:50:34 +0100

On 25/03/2021 20:42, Dan van der Ster wrote:
netstat -anp | grep LISTEN | grep mgr

has it bound to 127.0.0.1 ?

(also check the other daemons).

If so this is another case of https://tracker.ceph.com/issues/49938

Do you have any idea for a workaround (or should I downgrade?). I'm 
running ceph on ubuntu 18.04 LTS

this seems to be happening on the mons/mgrs and osds

Cheers

/Simon

-- dan

On Thu, Mar 25, 2021 at 8:34 PM Simon Oosthoek <s.oosthoek@xxxxxxxxxxxxx> wrote:

Hi

I'm in a bit of a panic :-(

Recently we started attempting to configure a radosgw to our ceph
cluster, which was until now only doing cephfs (and rbd wss working as
well). We were messing about with ceph-ansible, as this was how we
originally installed the cluster. Anyway, it installed nautilus 14.2.18
on the radosgw and I though it would be good to pull up the rest of the
cluster to that level as well using our tried and tested ceph upgrade
script (it basically does an update of all ceph nodes one by one and
checks whether ceph is ok again before doing the next)

After the 3rd mon/mgr was done, all pg's were unavailable :-(
obviously, the script is not continuing, but ceph is also broken now...

The message deceptively is: HEALTH_WARN Reduced data availability: 5568
pgs inactive

That's all PGs!

I tried as a desperate measure to upgrade one ceph OSD node, but that
broke as well, the osd service on that node gets an interrupt from the
kernel....

the versions are now like:
20:29 [root@cephmon1 ~]# ceph versions
{
      "mon": {
          "ceph version 14.2.18
(befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3
      },
      "mgr": {
          "ceph version 14.2.18
(befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3
      },
      "osd": {
          "ceph version 14.2.15
(afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 156
      },
      "mds": {
          "ceph version 14.2.15
(afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 2
      },
      "overall": {
          "ceph version 14.2.15
(afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 158,
          "ceph version 14.2.18
(befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 6
      }
}

12 OSDs are down

# ceph -s
    cluster:
      id:     b489547c-ba50-4745-a914-23eb78e0e5dc
      health: HEALTH_WARN
              Reduced data availability: 5568 pgs inactive

    services:
      mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 50m)
      mgr: cephmon1(active, since 53m), standbys: cephmon3, cephmon2
      mds: cephfs:1 {0=cephmds2=up:active} 1 up:standby
      osd: 168 osds: 156 up (since 28m), 156 in (since 18m); 1722
remapped pgs

    data:
      pools:   12 pools, 5568 pgs
      objects: 0 objects, 0 B
      usage:   0 B used, 0 B / 0 B avail
      pgs:     100.000% pgs unknown
               5568 unknown

    progress:
      Rebalancing after osd.103 marked in
        [..............................]

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx