Martin Conway wrote: > I find that backfilling and possibly scrubbing often comes to a halt for no apparent > reason. If I put a server into maintenance mode or kill and restart OSDs it bursts back > into life again. > > Not sure how to diagnose why the recovery processes have stalled. My cluster is in this stalled state now, I have saved some details below. Seems to point quite heavily to OSD.32 and OSD.33 but there is nothing of note in their logs. They were experiencing slow ops last night, and this morning have logged nothing. I am certain recovery and scrubbing will resume if I restarted those OSDs, but it would be nice to know what keep causing this. ceph -s cluster: id: 16bb4f7a-cf04-4667-aeee-94ce7f6ab672 health: HEALTH_WARN 441 pgs not deep-scrubbed in time 43 pgs not scrubbed in time services: mon: 5 daemons, quorum scustor3,scustor2,scustor1,scustor4,scustor5 (age 23h) mgr: scustor3.wplaov(active, since 2d), standbys: scustor4.giyegr, scustor1.luywbi, scustor2.ncfaec mds: 2/2 daemons up, 1 standby osd: 31 osds: 31 up (since 23h), 31 in (since 4d); 54 remapped pgs data: volumes: 1/1 healthy pools: 8 pools, 897 pgs objects: 48.52M objects, 47 TiB usage: 130 TiB used, 130 TiB / 260 TiB avail pgs: 3497437/145569867 objects misplaced (2.403%) 757 active+clean 70 active+clean+scrubbing 48 active+remapped+backfilling 14 active+clean+scrubbing+deep 6 active+recovering+remapped 2 active+recovering io: client: 24 KiB/s rd, 416 KiB/s wr, 1 op/s rd, 44 op/s wr ceph pg dump https://pastebin.com/raw/KPBie7SD ceph pg dump_stuck PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY 5.1f4 active+remapped+backfilling [32,5,18] 32 [32,5,21] 32 5.1f0 active+remapped+backfilling [32,2,13] 32 [32,2,5] 32 5.1e4 active+remapped+backfilling [33,13,10] 33 [33,6,5] 33 5.1c5 active+remapped+backfilling [33,2,10] 33 [33,10,16] 33 5.1bc active+remapped+backfilling [18,33,11] 18 [33,11,6] 33 5.193 active+remapped+backfilling [33,14,3] 33 [33,13,5] 33 5.180 active+remapped+backfilling [32,13,1] 32 [32,1,10] 32 5.171 active+remapped+backfilling [33,1,18] 33 [33,1,20] 33 5.16b active+recovering+remapped [32,9,13] 32 [32,9,2] 32 5.16a active+remapped+backfilling [13,4,6] 13 [33,4,16] 33 5.169 active+remapped+backfilling [14,33,10] 14 [33,10,13] 33 5.162 active+remapped+backfilling [32,6,13] 32 [32,6,22] 32 5.130 active+remapped+backfilling [13,32,3] 13 [32,3,1] 32 5.1cd active+remapped+backfilling [33,5,18] 33 [33,5,13] 33 6.3b active+recovering [32,1,3] 32 [32,1,3] 32 6.42 active+remapped+backfilling [18,33,9] 18 [33,9,13] 33 5.4e active+remapped+backfilling [32,11,10] 32 [32,10,22] 32 5.167 active+remapped+backfilling [33,18,6] 33 [33,6,2] 33 6.20 active+remapped+backfilling [14,32,10] 14 [32,13,1] 32 5.52 active+remapped+backfilling [14,1,32] 14 [32,1,13] 32 5.57 active+recovering+remapped [32,9,13] 32 [32,9,22] 32 5.49 active+remapped+backfilling [18,32,9] 18 [32,9,13] 32 5.1f7 active+recovering+remapped [9,32,13] 9 [32,20,1] 32 5.100 active+remapped+backfilling [33,9,13] 33 [33,9,20] 33 5.58 active+remapped+backfilling [32,6,4] 32 [32,6,20] 32 5.16 active+recovering+remapped [32,18,3] 32 [32,3,9] 32 5.60 active+recovering+remapped [31,13,4] 31 [32,4,9] 32 5.fc active+recovering [32,9,10] 32 [32,9,10] 32 5.c8 active+remapped+backfilling [32,14,3] 32 [32,3,13] 32 5.ad active+remapped+backfilling [32,3,16] 32 [32,3,22] 32 5.6e active+remapped+backfilling [32,5,18] 32 [32,5,11] 32 6.6d active+remapped+backfilling [32,6,5] 32 [32,5,21] 32 5.cf active+remapped+backfilling [33,9,18] 33 [33,11,16] 33 5.7e active+remapped+backfilling [32,9,18] 32 [32,4,16] 32 6.37 active+remapped+backfilling [32,4,3] 32 [32,4,21] 32 5.1aa active+remapped+backfilling [13,33,10] 13 [33,10,1] 33 5.165 active+remapped+backfilling [33,5,16] 33 [33,5,21] 33 5.76 active+remapped+backfilling [13,1,32] 13 [32,1,22] 32 5.102 active+remapped+backfilling [33,5,6] 33 [33,5,21] 33 5.2d active+remapped+backfilling [32,18,4] 32 [32,4,5] 32 6.24 active+remapped+backfilling [33,18,2] 33 [33,9,3] 33 5.f6 active+remapped+backfilling [32,1,14] 32 [32,1,22] 32 5.1c active+remapped+backfilling [33,18,3] 33 [33,3,22] 33 5.d9 active+remapped+backfilling [33,18,11] 33 [33,11,9] 33 5.184 active+remapped+backfilling [32,14,2] 32 [32,20,5] 32 5.e6 active+remapped+backfilling [18,33,16] 18 [33,16,13] 33 5.18f active+recovering+remapped [18,32,9] 18 [32,9,13] 32 5.e9 active+remapped+backfilling [32,13,9] 32 [32,13,2] 32 5.55 active+remapped+backfilling [32,6,14] 32 [32,6,3] 32 5.eb active+remapped+backfilling [18,33,11] 18 [32,20,6] 32 6.13 active+remapped+backfilling [14,10,1] 14 [32,20,13] 32 5.107 active+remapped+backfilling [14,3,31] 14 [32,3,1] 32 5.109 active+remapped+backfilling [32,4,14] 32 [32,4,13] 32 5.117 active+remapped+backfilling [33,16,3] 33 [33,16,20] 33 6.30 active+remapped+backfilling [32,4,1] 32 [32,1,21] 32 5.126 active+remapped+backfilling [33,9,4] 33 [33,9,21] 33 ok _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx