Re: OSD rebalancing issue - should drives be distributed equally over all nodes

Thomas <74cmonty@xxxxxxxxx> · Mon, 23 Sep 2019 11:49:27 +0200

Hi,
I have already balancer mode upmap enabled.
root@ld3955:/mnt/pve/pve_cephfs/template/iso# ceph balancer status
{
    "active": true,
    "plans": [],
    "mode": "upmap"
}

However there are OSD with 60% and others with 90% usage belonging to
the same pool with the same disk size.
This looks to me like a big range.

Regards
Thomas

Am 23.09.2019 um 11:42 schrieb EDH - Manuel Rios Fernandez:
> Hi Thomas,
>
> For 100% byte distribution of data across OSD, you should setup ceph balancer in "byte" mode, not in PG mode.
>
> Change will distribute all osd with the same % of usage, but the objects will be NOT reduntant.
>
> After several weeks and months testing balancer the best profile is balance by PG with unmap.
>
> In PG mode you are going to get always "until balancer got a better algorithm" a not equially data distributed, an you sometime should manually redistribute weight by CLI.
>
> You can play with balancer directly from Dashboard from Nautilus. Balancer is not an "active" agent asked before storage data into disk, first ceph store data and them balancer move objects.
>
> Regards
>
> Manuel
>
>
> -----Mensaje original-----
> De: Thomas <74cmonty@xxxxxxxxx> 
> Enviado el: lunes, 23 de septiembre de 2019 11:08
> Para: ceph-users@xxxxxxx
> Asunto:  OSD rebalancing issue - should drives be distributed equally over all nodes
>
> Hi,
>
> I'm facing several issues with my ceph cluster (2x MDS, 6x ODS).
> Here I would like to focus on the issue with pgs backfill_toofull.
> I assume this is related to the fact that the data distribution on my OSDs is not balanced.
>
> This is the current ceph status:
> root@ld3955:~# ceph -s
>   cluster:
>     id:     6b1b5117-6e08-4843-93d6-2da3cf8a6bae
>     health: HEALTH_ERR
>             1 MDSs report slow metadata IOs
>             78 nearfull osd(s)
>             1 pool(s) nearfull
>             Reduced data availability: 2 pgs inactive, 2 pgs peering
>             Degraded data redundancy: 304136/153251211 objects degraded (0.198%), 57 pgs degraded, 57 pgs undersized
>             Degraded data redundancy (low space): 265 pgs backfill_toofull
>             3 pools have too many placement groups
>             74 slow requests are blocked > 32 sec
>             80 stuck requests are blocked > 4096 sec
>
>   services:
>     mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 98m)
>     mgr: ld5505(active, since 3d), standbys: ld5506, ld5507
>     mds: pve_cephfs:1 {0=ld3976=up:active} 1 up:standby
>     osd: 368 osds: 368 up, 367 in; 302 remapped pgs
>
>   data:
>     pools:   5 pools, 8868 pgs
>     objects: 51.08M objects, 195 TiB
>     usage:   590 TiB used, 563 TiB / 1.1 PiB avail
>     pgs:     0.023% pgs not active
>              304136/153251211 objects degraded (0.198%)
>              1672190/153251211 objects misplaced (1.091%)
>              8564 active+clean
>              196  active+remapped+backfill_toofull
>              57   active+undersized+degraded+remapped+backfill_toofull
>              35   active+remapped+backfill_wait
>              12   active+remapped+backfill_wait+backfill_toofull
>              2    active+remapped+backfilling
>              2    peering
>
>   io:
>     recovery: 18 MiB/s, 4 objects/s
>
>
> Currently I'm using 6 OSD nodes.
> Node A
> 48x 1.6TB HDD
> Node B
> 48x 1.6TB HDD
> Node C
> 48x 1.6TB HDD
> Node D
> 48x 1.6TB HDD
> Node E
> 48x 7.2TB HDD
> Node F
> 48x 7.2TB HDD
>
> Question:
> Is it advisable to distribute the drives equally over all nodes?
> If yes, how should this be executed w/o ceph disruption?
>
> Regards
> Thomas
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx