Re: upgrade problem nautilus 14.2.15 -> 14.2.18? (Broken ceph!)

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Thu, 25 Mar 2021 20:42:43 +0100

netstat -anp | grep LISTEN | grep mgr

has it bound to 127.0.0.1 ?

(also check the other daemons).

If so this is another case of https://tracker.ceph.com/issues/49938

-- dan

On Thu, Mar 25, 2021 at 8:34 PM Simon Oosthoek <s.oosthoek@xxxxxxxxxxxxx> wrote:
>
> Hi
>
> I'm in a bit of a panic :-(
>
> Recently we started attempting to configure a radosgw to our ceph
> cluster, which was until now only doing cephfs (and rbd wss working as
> well). We were messing about with ceph-ansible, as this was how we
> originally installed the cluster. Anyway, it installed nautilus 14.2.18
> on the radosgw and I though it would be good to pull up the rest of the
> cluster to that level as well using our tried and tested ceph upgrade
> script (it basically does an update of all ceph nodes one by one and
> checks whether ceph is ok again before doing the next)
>
> After the 3rd mon/mgr was done, all pg's were unavailable :-(
> obviously, the script is not continuing, but ceph is also broken now...
>
> The message deceptively is: HEALTH_WARN Reduced data availability: 5568
> pgs inactive
>
> That's all PGs!
>
> I tried as a desperate measure to upgrade one ceph OSD node, but that
> broke as well, the osd service on that node gets an interrupt from the
> kernel....
>
> the versions are now like:
> 20:29 [root@cephmon1 ~]# ceph versions
> {
>      "mon": {
>          "ceph version 14.2.18
> (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3
>      },
>      "mgr": {
>          "ceph version 14.2.18
> (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3
>      },
>      "osd": {
>          "ceph version 14.2.15
> (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 156
>      },
>      "mds": {
>          "ceph version 14.2.15
> (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 2
>      },
>      "overall": {
>          "ceph version 14.2.15
> (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 158,
>          "ceph version 14.2.18
> (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 6
>      }
> }
>
>
> 12 OSDs are down
>
> # ceph -s
>    cluster:
>      id:     b489547c-ba50-4745-a914-23eb78e0e5dc
>      health: HEALTH_WARN
>              Reduced data availability: 5568 pgs inactive
>
>    services:
>      mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 50m)
>      mgr: cephmon1(active, since 53m), standbys: cephmon3, cephmon2
>      mds: cephfs:1 {0=cephmds2=up:active} 1 up:standby
>      osd: 168 osds: 156 up (since 28m), 156 in (since 18m); 1722
> remapped pgs
>
>    data:
>      pools:   12 pools, 5568 pgs
>      objects: 0 objects, 0 B
>      usage:   0 B used, 0 B / 0 B avail
>      pgs:     100.000% pgs unknown
>               5568 unknown
>
>    progress:
>      Rebalancing after osd.103 marked in
>        [..............................]
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx