Re: Cluster degraded after adding OSDs to increase capacity

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Thu, 27 Aug 2020 08:46:35 -0700

Doubling the capacity in one shot was a big topology change, hence the 53% misplaced.

OSD fullness will naturally reflect a bell curve; there will be a tail of under-full and over-full OSDs.  If you’d not said that your cluster was very full before expansion I would have predicted it from the full / nearfull OSDs.

Think of CRUSH has a hash function that can experience collisions.  When you change the topology, some collisions are removed, and sometimes PGs newly land on OSDs that they were previously redirected from, which can result in additional fillage.   This can also occur as just a natural result of move data moving onto a given OSD before it’s moved off, especially as Ceph makes copies before deleting the old during a move, to maintain full redundancy along the way.

`ceph osd df | sort -nk8`

Couple of ways to recover, depending on the unspecified release that you’re running.  You need to squeeze the most-full outliers down on a continual basis going forward.

* Balance OSDs with either the ceph-mgr pg-upmap balancer (if all clients are Luminous or better)
* Balance OSDs with reweight-by-utilization
* Balance OSDs with override weights `ceph osd reweight osd.666 0.xx`
* Raise the osd full ratio and backfill full ratio a few percentage points to let the 3 affected OSDs drain.  You may need to restart them serially for the new setting to take effect.

> On Aug 27, 2020, at 8:28 AM, Dallas Jones <djones@xxxxxxxxxxxxxxxxx> wrote:
> 
> My 3-node Ceph cluster (14.2.4) has been running fine for months. However,
> my data pool became close to full a couple of weeks ago, so I added 12 new
> OSDs, roughly doubling the capacity of the cluster. However, the pool size
> has not changed, and the health of the cluster has changed for the worse.
> The dashboard shows the following cluster status:
> 
>   - PG_DEGRADED_FULL: Degraded data redundancy (low space): 2 pgs
>   backfill_toofull
>   - POOL_NEARFULL: 6 pool(s) nearfull
>   - OSD_NEARFULL: 1 nearfull osd(s)
> 
> Output from ceph -s:
> 
>  cluster:
>    id:     e5a47160-a302-462a-8fa4-1e533e1edd4e
>    health: HEALTH_ERR
>            1 nearfull osd(s)
>            6 pool(s) nearfull
>            Degraded data redundancy (low space): 2 pgs backfill_toofull
> 
>  services:
>    mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 5w)
>    mgr: ceph01(active, since 4w), standbys: ceph03, ceph02
>    mds: cephfs:1 {0=ceph01=up:active} 2 up:standby
>    osd: 33 osds: 33 up (since 43h), 33 in (since 43h); 1094 remapped pgs
>    rgw: 3 daemons active (ceph01, ceph02, ceph03)
> 
>  data:
>    pools:   6 pools, 1632 pgs
>    objects: 134.50M objects, 7.8 TiB
>    usage:   42 TiB used, 81 TiB / 123 TiB avail
>    pgs:     213786007/403501920 objects misplaced (52.983%)
>             1088 active+remapped+backfill_wait
>             538  active+clean
>             4    active+remapped+backfilling
>             2    active+remapped+backfill_wait+backfill_toofull
> 
>  io:
>    recovery: 477 KiB/s, 330 keys/s, 29 objects/s
> 
> Can someone steer me in the right direction for how to get my cluster
> healthy again?
> 
> Thanks in advance!
> 
> -Dallas
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx