Hi,
TL;DR:
Selecting a different CRUSH rule (stretch_rule, no device class) for
pool SSD results in degraded objects (unexpected) and misplaced objects
(expected). Why would Ceph drop up to two healthy copies?
Consider this two data center cluster:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.78400 root default
-10 0.39200 datacenter DC1
-3 0.39200 host pve1
0 hdd 0.09799 osd.0 up 1.00000 1.00000
1 hdd 0.09799 osd.1 up 1.00000 1.00000
4 ssd 0.09799 osd.4 up 1.00000 1.00000
5 ssd 0.09799 osd.5 up 1.00000 1.00000
-11 0.39200 datacenter DC2
-5 0.39200 host pve2
2 hdd 0.09799 osd.2 up 1.00000 1.00000
3 hdd 0.09799 osd.3 up 1.00000 1.00000
6 ssd 0.09799 osd.6 up 1.00000 1.00000
7 ssd 0.09799 osd.7 up 1.00000 1.00000
Pools available:
device_health_metrics
HDD
SSD
Let's focus on SSD for now. Crush rule in use for SSD pool:
rule SSD {
id 2
type replicated
min_size 1
max_size 10
step take default class ssd
step choose firstn 0 type host
step chooseleaf firstn 0 type osd
step emit
}
SSD pool replication settings: min_size=2, size=4
The new stretch rule to use:
rule stretch_rule {
id 3
type replicated
min_size 1
max_size 10
step take default
step take DC1
step choose firstn 0 type host
step chooseleaf firstn 2 type osd
step emit
step take default
step take DC2
step choose firstn 0 type host
step chooseleaf firstn 2 type osd
step emit
}
ceph pg ls for pool 3 (SSD):
3.0 184 0 0 0 738070528 0 0 7458 active+clean 9m 926'7458 951:18695 [7,6,5,4]p7 [7,6,5,4]p7 2024-06-05T08:52:51.951936+0200 2024-06-05T08:52:51.951936+0200
3.1 169 0 0 0 667156655 0 0 3242 active+clean 18m 926'3242 951:14151 [4,5,6,7]p4 [4,5,6,7]p4 2024-06-05T08:54:42.948682+0200 2024-06-05T08:54:42.948682+0200
3.2 221 0 0 0 885641968 0 0 6989 active+clean 9m 926'6989 951:17645 [4,5,7,6]p4 [4,5,7,6]p4 2024-06-05T08:53:27.981787+0200 2024-06-05T08:53:27.981787+0200
3.3 180 0 0 0 716509184 0 0 3194 active+clean 9m 926'3194 951:14191 [4,5,6,7]p4 [4,5,6,7]p4 2024-06-05T08:54:29.584216+0200 2024-06-05T08:54:29.584216+0200
3.4 189 0 0 0 754417698 137 8 3616 active+clean 9m 926'3616 951:19245 [6,7,4,5]p6 [6,7,4,5]p6 2024-06-05T08:54:02.307323+0200 2024-06-05T08:54:02.307323+0200
3.5 188 0 0 0 742543377 0 0 5992 active+clean 9m 926'5992 951:18862 [6,7,5,4]p6 [6,7,5,4]p6 2024-06-05T08:53:09.483136+0200 2024-06-05T08:53:09.483136+0200
3.6 191 0 0 0 769482752 150 16 6810 active+clean 9m 926'6810 951:30043 [7,6,4,5]p7 [7,6,4,5]p7 2024-06-05T08:53:46.646517+0200 2024-06-05T08:53:46.646517+0200
3.7 170 0 0 0 681587379 0 0 10081 active+clean 9m 926'21581 951:28473 [4,5,7,6]p4 [4,5,7,6]p4 2024-06-05T08:54:16.047967+0200 2024-06-05T08:54:16.047967+0200
ceph pg ls when the new crush rule is selected for pool SSD:
3.0 184 372 186 0 738070528 0 0 7458 active+recovery_wait+undersized+degraded+remapped 14s 926'7458 955:18688 [1,0,3,7]p1 [4,7]p4 2024-06-05T08:52:51.951936+0200 2024-06-05T08:52:51.951936+0200
3.1 169 0 0 0 667156655 0 0 3242 active+clean 19m 926'3242 954:14154 [4,5,6,7]p4 [4,5,6,7]p4 2024-06-05T08:54:42.948682+0200 2024-06-05T08:54:42.948682+0200
3.2 221 444 0 0 885641968 0 0 6989 active+recovery_wait+undersized+degraded+remapped 14s 926'6989 955:17657 [4,5,3,2]p4 [4,5]p4 2024-06-05T08:53:27.981787+0200 2024-06-05T08:53:27.981787+0200
3.3 180 0 540 0 716509184 0 0 3194 active+recovering+undersized+degraded+remapped 15s 926'3194 955:14204 [4,0,2,3]p4 [4,0]p4 2024-06-05T08:54:29.584216+0200 2024-06-05T08:54:29.584216+0200
3.4 189 378 0 0 754417698 0 0 3616 active+recovery_wait+undersized+degraded+remapped 15s 926'3616 955:19220 [1,4,6,2]p1 [4,6]p4 2024-06-05T08:54:02.307323+0200 2024-06-05T08:54:02.307323+0200
3.5 188 189 0 0 742543377 0 0 5992 active+recovery_wait+undersized+degraded+remapped 15s 926'5992 955:18845 [5,4,6,3]p5 [5,4,6]p5 2024-06-05T08:53:09.483136+0200 2024-06-05T08:53:09.483136+0200
3.6 191 390 195 0 769482752 0 0 6810 active+recovery_wait+undersized+degraded+remapped 14s 926'6810 955:30016 [0,1,2,7]p0 [4,7]p4 2024-06-05T08:53:46.646517+0200 2024-06-05T08:53:46.646517+0200
3.7 170 0 170 0 681587379 0 0 10081 active+remapped+backfill_wait 14s 926'21581 955:28486 [4,5,3,6]p4 [4,5,6,7]p4 2024-06-05T08:54:16.047967+0200 2024-06-05T08:54:16.047967+0200
So CRUSH is able to find a suitable mapping just fine, but somehow Ceph
decides to drop up to two healthy copies from its acting se,t and I do
not understand why. I would expect only misplaced objects at this point.
Ceph version is 15.2.17. Latest CRUSH tunables (ceph osd crush tunables
optimal).
Do I miss something obvious here? If so, would you please point it out
to me? :D.
Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx