You meta data PGs *are* backfilling. It is the "61 keys/s" statement
in the ceph status output in the recovery I/O line. If this is too
slow, increase osd_max_backfills and osd_recovery_max_active.
Or just have some coffee ...
I already had increased osd_max_backfills and osd_recovery_max_active
in order to speed things up, and most of the PGs were remapped pretty
quick (couple of minutes), but these last 3 PGs took almost two hours
to complete, which was unexpected.
Zitat von Frank Schilder <frans@xxxxxx>:
You meta data PGs *are* backfilling. It is the "61 keys/s" statement
in the ceph status output in the recovery I/O line. If this is too
slow, increase osd_max_backfills and osd_recovery_max_active.
Or just have some coffee ...
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Eugen Block <eblock@xxxxxx>
Sent: 10 October 2019 14:54:37
To: ceph-users@xxxxxxx
Subject: Nautilus: PGs stuck remapped+backfilling
Hi all,
I have a strange issue with backfilling and I'm not sure what the cause is.
It's a Nautilus cluster (upgraded) that has an SSD cache tier for
OpenStack and CephFS metadata residing on the same SSDs, there were
three SSDs in total.
Today I added two new SSDs (NVMe) (osd.15, osd.16) to be able to
shutoff one old server that has only one SSD-OSD left (osd.20).
Setting the crush weight of osd.20 to 0 (and adjusting the weight of
the remaining SSDs for an even distribution) leaves 3 PGs in
active+remapped+backfilling state. I don't understand why the
remaining PGs aren't backfilling, the crush rule is quite simple (all
ssd pools are replicated with size 3). The backfilling PGs are all
from the cephfs_metadata pool. Although there are 4 SSDs for 3
replicas the backfilling still should finish, right?
Can anyone share their thoughts why 3 PGs can't be recovered? If more
information about the cluster is required please let me know.
Regards,
Eugen
ceph01:~ # ceph osd pool ls detail | grep meta
pool 36 'cephfs-metadata' replicated size 3 min_size 2 crush_rule 1
object_hash rjenkins pg_num 16 pgp_num 16 last_change 283362 flags
hashpspool,nodelete,nodeep-scrub stripe_width 0 application cephfs
ceph01:~ # ceph pg dump | grep remapp
dumped all
36.b 28306 0 0 28910 0
8388608 101408323 219497 3078 3078
active+remapped+backfilling 2019-10-10 13:36:27.427527
284595'98565869 284595:254216941 [15,16,9] 15
[20,9,10] 20 284427'98489406 2019-10-10
00:16:02.682911 284089'98003598 2019-10-06 16:03:27.558267
0
36.d 28087 0 0 25327 0
26375382 106722204 231020 3041 3041
active+remapped+backfilling 2019-10-10 13:36:27.404739
284595'97933905 284595:252878816 [16,15,9] 16
[20,9,10] 20 284427'97887652 2019-10-10
04:13:29.371905 284259'97502135 2019-10-07 20:06:43.304593
0
36.4 28060 0 0 28406 0
8389242 104059103 225188 3061 3061
active+remapped+backfilling 2019-10-10 13:36:27.440390
284595'105299618 284595:312976619 [16,9,15] 16
[20,9,10] 20 284427'105218591 2019-10-10
00:18:07.924006 284089'104696098 2019-10-06 16:20:17.123149
0
rule ssd_ruleset {
id 1
type replicated
min_size 1
max_size 10
step take default class ssd
step chooseleaf firstn 0 type host
step emit
}
This is the relevant part of the osd tree:
ceph01:~ # ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 34.21628 root default
-31 11.25406 host ceph01
25 hdd 3.59999 osd.25 up 1.00000 1.00000
26 hdd 3.59999 osd.26 up 1.00000 1.00000
27 hdd 3.59999 osd.27 up 1.00000 1.00000
15 ssd 0.45409 osd.15 up 1.00000 1.00000
-34 11.25406 host ceph02
0 hdd 3.59999 osd.0 up 1.00000 1.00000
28 hdd 3.59999 osd.28 up 1.00000 1.00000
29 hdd 3.59999 osd.29 up 1.00000 1.00000
16 ssd 0.45409 osd.16 up 1.00000 1.00000
-37 10.79999 host ceph03
31 hdd 3.59999 osd.31 up 1.00000 1.00000
32 hdd 3.59999 osd.32 up 1.00000 1.00000
33 hdd 3.59999 osd.33 up 1.00000 1.00000
-24 0.45409 host san01-ssd
10 ssd 0.45409 osd.10 up 1.00000 1.00000
-23 0.45409 host san02-ssd
9 ssd 0.45409 osd.9 up 1.00000 1.00000
-22 0 host san03-ssd
20 ssd 0 osd.20 up 1.00000 1.00000
Don't be confused because of the '-ssd' suffix, we're using crush
location hooks.
This is the current PG distribution on the SSDs:
ceph01:~ # ceph osd df | grep -E "^15 |^16 |^ 9|^10 |^20 "
15 ssd 0.45409 1.00000 465 GiB 34 GiB 32 GiB 1.2 GiB 857 MiB 431
GiB 7.29 0.22 27 up
16 ssd 0.45409 1.00000 465 GiB 37 GiB 34 GiB 1.5 GiB 964 MiB 428
GiB 7.87 0.23 31 up
10 ssd 0.45409 1.00000 745 GiB 27 GiB 25 GiB 1.7 GiB 950 MiB 718
GiB 3.65 0.11 29 up
9 ssd 0.45409 1.00000 745 GiB 34 GiB 32 GiB 1.3 GiB 902 MiB
711 GiB 4.60 0.14 30 up
20 ssd 0 1.00000 894 GiB 8.2 GiB 4.3 GiB 1.5 GiB 2.4 GiB 886
GiB 0.91 0.03 3 up
Current ceph status:
ceph01:~ # ceph -s
cluster:
id: 655cb05a-435a-41ba-83d9-8549f7c36167
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 2d)
mgr: ceph03(active, since 8d), standbys: ceph01, ceph02
mds: cephfs:1 {0=mds01=up:active} 1 up:standby-replay 1 up:standby
osd: 26 osds: 26 up (since 66m), 26 in (since 66m); 3 remapped pgs
data:
pools: 8 pools, 264 pgs
objects: 4.96M objects, 5.0 TiB
usage: 16 TiB used, 31 TiB / 47 TiB avail
pgs: 115745/14865558 objects misplaced (0.779%)
261 active+clean
3 active+remapped+backfilling
io:
client: 903 KiB/s rd, 8.8 MiB/s wr, 85 op/s rd, 266 op/s wr
recovery: 0 B/s, 61 keys/s, 12 objects/s
cache: 4.2 MiB/s flush, 15 MiB/s evict, 0 op/s promote
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx