Hello,
I have a cluster with 6 OSD nodes each with 10 SATA 8TB drives. Node 6 was just added. All nodes are 10Gbps on the network with Jumbo frames. S3 application access is working as expected but recovery is extremely slow. Based on past posts I attempted to do the following:
Alter the osd_recovery_sleep_hdd. I tried 0 and 0.1. 0 seems to improve the speed slightly but it is still very slow. I also attempted to change osd_max_backfills to 16 from 8 and osd_recovery_max_active to 8 from 4. This showed no noticeable improvement. The cluster is running 13.2.5. Here is the output from ceph -s
cluster:
id: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
health: HEALTH_ERR
3 large omap objects
67164650/268993641 objects misplaced (24.969%)
Degraded data redundancy: 612258/268993641 objects degraded (0.228%), 8 pgs degraded, 8 pgs undersized
Degraded data redundancy (low space): 9 pgs backfill_toofull
services:
mon: 3 daemons, quorum mon1,mon2,mon3
mgr: mon2(active), standbys: mon1
osd: 55 osds: 50 up, 50 in; 531 remapped pgs
rgw: 3 daemons active
data:
pools: 15 pools, 1476 pgs
objects: 89.66 M objects, 49 TiB
usage: 159 TiB used, 205 TiB / 364 TiB avail
pgs: 612258/268993641 objects degraded (0.228%)
67164650/268993641 objects misplaced (24.969%)
945 active+clean
507 active+remapped+backfill_wait
9 active+remapped+backfill_wait+backfill_toofull
7 active+remapped+backfilling
4 active+undersized+degraded+remapped+backfill_wait
4 active+undersized+degraded+remapped+backfilling
io:
client: 5.3 MiB/s rd, 3.9 MiB/s wr, 844 op/s rd, 81 op/s wr
recovery: 19 MiB/s, 33 objects/s
id: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
health: HEALTH_ERR
3 large omap objects
67164650/268993641 objects misplaced (24.969%)
Degraded data redundancy: 612258/268993641 objects degraded (0.228%), 8 pgs degraded, 8 pgs undersized
Degraded data redundancy (low space): 9 pgs backfill_toofull
services:
mon: 3 daemons, quorum mon1,mon2,mon3
mgr: mon2(active), standbys: mon1
osd: 55 osds: 50 up, 50 in; 531 remapped pgs
rgw: 3 daemons active
data:
pools: 15 pools, 1476 pgs
objects: 89.66 M objects, 49 TiB
usage: 159 TiB used, 205 TiB / 364 TiB avail
pgs: 612258/268993641 objects degraded (0.228%)
67164650/268993641 objects misplaced (24.969%)
945 active+clean
507 active+remapped+backfill_wait
9 active+remapped+backfill_wait+backfill_toofull
7 active+remapped+backfilling
4 active+undersized+degraded+remapped+backfill_wait
4 active+undersized+degraded+remapped+backfilling
io:
client: 5.3 MiB/s rd, 3.9 MiB/s wr, 844 op/s rd, 81 op/s wr
recovery: 19 MiB/s, 33 objects/s
Any clue at what I can looks at further to investigate the slow recovery would be appreciated.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com