On 8/14/19 9:48 AM, Simon Oosthoek wrote: > Hi all, > > Yesterday I marked out all the osds on one node in our new cluster to > reconfigure them with WAL/DB on their NVMe devices, but it is taking > ages to rebalance. The whole cluster (and thus the osds) is only ~1% > full, therefore the full ratio is nowhere in sight. > > We have 14 osd nodes with 12 disks each, one of them was marked out, > Yesterday around noon. It is still not completed and all the while, the > cluster is in ERROR state, even though this is a normal maintenance > operation. > > We are still experimenting with the cluster, and it is still operational > while being in ERROR state, however it is slightly worrying when > considering that it could take even (50x?) longer if the cluster has 50x > the amount of data. And the OSD's are mostly flatlined in the dashboard > graphs, so it could potentially do it much faster, I think. > > below are a few outputs of ceph -s(w): > > Yesterday afternoon (~16:00) > # ceph -w > cluster: > id: b489547c-ba50-4745-a914-23eb78e0e5dc > health: HEALTH_ERR > Degraded data redundancy (low space): 139 pgs backfill_toofull > > services: > mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 4h) > mgr: cephmon1(active, since 4h), standbys: cephmon2, cephmon3 > mds: cephfs:1 {0=cephmds1=up:active} 1 up:standby > osd: 168 osds: 168 up (since 3h), 156 in (since 3h); 1588 remapped pgs > rgw: 1 daemon active (cephs3.rgw0) > > data: > pools: 12 pools, 4116 pgs > objects: 14.04M objects, 11 TiB > usage: 20 TiB used, 1.7 PiB / 1.8 PiB avail > pgs: 16188696/109408503 objects misplaced (14.797%) > 2528 active+clean > 1422 active+remapped+backfill_wait > 139 active+remapped+backfill_wait+backfill_toofull > 27 active+remapped+backfilling > > io: > recovery: 205 MiB/s, 198 objects/s > > progress: > Rebalancing after osd.47 marked out > [=====================.........] > Rebalancing after osd.5 marked out > [===================...........] > Rebalancing after osd.132 marked out > [=====================.........] > Rebalancing after osd.90 marked out > [=====================.........] > Rebalancing after osd.76 marked out > [=====================.........] > Rebalancing after osd.157 marked out > [==================............] > Rebalancing after osd.19 marked out > [=====================.........] > Rebalancing after osd.118 marked out > [====================..........] > Rebalancing after osd.146 marked out > [=================.............] > Rebalancing after osd.104 marked out > [====================..........] > Rebalancing after osd.62 marked out > [=======================.......] > Rebalancing after osd.33 marked out > [======================........] > > > This morning: > # ceph -s > cluster: > id: b489547c-ba50-4745-a914-23eb78e0e5dc > health: HEALTH_ERR > Degraded data redundancy (low space): 8 pgs backfill_toofull > > services: > mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 22h) > mgr: cephmon1(active, since 22h), standbys: cephmon2, cephmon3 > mds: cephfs:1 {0=cephmds2=up:active} 1 up:standby > osd: 168 osds: 168 up (since 22h), 156 in (since 21h); 189 remapped pgs > rgw: 1 daemon active (cephs3.rgw0) > > data: > pools: 12 pools, 4116 pgs > objects: 14.11M objects, 11 TiB > usage: 21 TiB used, 1.7 PiB / 1.8 PiB avail > pgs: 4643284/110159565 objects misplaced (4.215%) > 3927 active+clean > 162 active+remapped+backfill_wait > 19 active+remapped+backfilling > 8 active+remapped+backfill_wait+backfill_toofull > > io: > client: 32 KiB/s rd, 0 B/s wr, 31 op/s rd, 21 op/s wr > recovery: 198 MiB/s, 149 objects/s > It is still recovering it seems with 149 objects/second. > progress: > Rebalancing after osd.47 marked out > [=============================.] > Rebalancing after osd.5 marked out > [=============================.] > Rebalancing after osd.132 marked out > [=============================.] > Rebalancing after osd.90 marked out > [=============================.] > Rebalancing after osd.76 marked out > [=============================.] > Rebalancing after osd.157 marked out > [=============================.] > Rebalancing after osd.19 marked out > [=============================.] > Rebalancing after osd.146 marked out > [=============================.] > Rebalancing after osd.104 marked out > [=============================.] > Rebalancing after osd.62 marked out > [=============================.] > > > I found some hints, though I'm not sure it's right for us at this url: > https://forum.proxmox.com/threads/increase-ceph-recovery-speed.36728/ > : >> ceph tell 'osd.*' injectargs '--osd-max-backfills 16' >> ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4' > > Since the cluster is currently hardly loaded, backfilling can take up > all the unused bandwidth as far as I'm concerned... > > Is it a good idea to give the above commands or other commands to speed > up the backfilling? (e.g. like increasing "osd max backfills") > Yes, as right now the OSDs aren't doing that many backfills. You still have a large queue of PGs which need to be backfilled. $ ceph tell osd.* config set osd_max_backfills 5 The default is that only one (1) backfills runs at the same time per OSD. By setting it to 5 you speed up the process by increasing the concurrency. This will however add load to the system and thus reduce the available I/O for clients. Wido > Cheers > > /Simon > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com