OSD rebalancing issue - should drives be distributed equally over all nodes

Thomas <74cmonty@xxxxxxxxx> · Mon, 23 Sep 2019 11:07:43 +0200

Hi,

I'm facing several issues with my ceph cluster (2x MDS, 6x ODS).
Here I would like to focus on the issue with pgs backfill_toofull.
I assume this is related to the fact that the data distribution on my
OSDs is not balanced.

This is the current ceph status:
root@ld3955:~# ceph -s
  cluster:
    id:     6b1b5117-6e08-4843-93d6-2da3cf8a6bae
    health: HEALTH_ERR
            1 MDSs report slow metadata IOs
            78 nearfull osd(s)
            1 pool(s) nearfull
            Reduced data availability: 2 pgs inactive, 2 pgs peering
            Degraded data redundancy: 304136/153251211 objects degraded
(0.198%), 57 pgs degraded, 57 pgs undersized
            Degraded data redundancy (low space): 265 pgs backfill_toofull
            3 pools have too many placement groups
            74 slow requests are blocked > 32 sec
            80 stuck requests are blocked > 4096 sec

  services:
    mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 98m)
    mgr: ld5505(active, since 3d), standbys: ld5506, ld5507
    mds: pve_cephfs:1 {0=ld3976=up:active} 1 up:standby
    osd: 368 osds: 368 up, 367 in; 302 remapped pgs

  data:
    pools:   5 pools, 8868 pgs
    objects: 51.08M objects, 195 TiB
    usage:   590 TiB used, 563 TiB / 1.1 PiB avail
    pgs:     0.023% pgs not active
             304136/153251211 objects degraded (0.198%)
             1672190/153251211 objects misplaced (1.091%)
             8564 active+clean
             196  active+remapped+backfill_toofull
             57   active+undersized+degraded+remapped+backfill_toofull
             35   active+remapped+backfill_wait
             12   active+remapped+backfill_wait+backfill_toofull
             2    active+remapped+backfilling
             2    peering

  io:
    recovery: 18 MiB/s, 4 objects/s

Currently I'm using 6 OSD nodes.
Node A
48x 1.6TB HDD
Node B
48x 1.6TB HDD
Node C
48x 1.6TB HDD
Node D
48x 1.6TB HDD
Node E
48x 7.2TB HDD
Node F
48x 7.2TB HDD

Question:
Is it advisable to distribute the drives equally over all nodes?
If yes, how should this be executed w/o ceph disruption?

Regards
Thomas

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx