Re: degraded objects when setting different CRUSH rule on a pool, why?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Stefan,

I assume the number of dropped replicas is related to the pool's min_size. If you increase min_size to 3 you should see only one replica dropped from the acting set. I didn't run too detailed tests, but a first quick one seems to confirm that:

# Test with min_size 2, size 4
48.7   [6,2,11,10]p6         [6,2]p6

Changed the rule back to the previous rule, then changed it again with new min_size:

# Test with min_size 3, size 4
48.7   [6,2,11,10]p6      [6,2,10]p6

I don't really have an explanation why they are degraded, though. I think Frank already had some open bug reports for that, this topic comes up every now and then without any reasonable explanation.

Regards,
Eugen

Zitat von Stefan Kooman <stefan@xxxxxx>:

Hi,

TL;DR:

Selecting a different CRUSH rule (stretch_rule, no device class) for pool SSD results in degraded objects (unexpected) and misplaced objects (expected). Why would Ceph drop up to two healthy copies?

Consider this two data center cluster:

ID   CLASS  WEIGHT   TYPE NAME           STATUS  REWEIGHT  PRI-AFF
 -1         0.78400  root default
-10         0.39200      datacenter DC1
 -3         0.39200          host pve1
  0    hdd  0.09799              osd.0       up   1.00000  1.00000
  1    hdd  0.09799              osd.1       up   1.00000  1.00000
  4    ssd  0.09799              osd.4       up   1.00000  1.00000
  5    ssd  0.09799              osd.5       up   1.00000  1.00000
-11         0.39200      datacenter DC2
 -5         0.39200          host pve2
  2    hdd  0.09799              osd.2       up   1.00000  1.00000
  3    hdd  0.09799              osd.3       up   1.00000  1.00000
  6    ssd  0.09799              osd.6       up   1.00000  1.00000
  7    ssd  0.09799              osd.7       up   1.00000  1.00000

Pools available:

device_health_metrics
HDD
SSD

Let's focus on SSD for now. Crush rule in use for SSD pool:

rule SSD {
    id 2
    type replicated
    min_size 1
    max_size 10
    step take default class ssd
    step choose firstn 0 type host
    step chooseleaf firstn 0 type osd
    step emit
}

SSD pool replication settings: min_size=2, size=4

The new stretch rule to use:

rule stretch_rule {
    id 3
    type replicated
    min_size 1
    max_size 10
    step take default
    step take DC1
    step choose firstn 0 type host
    step chooseleaf firstn 2 type osd
    step emit
    step take default
    step take DC2
    step choose firstn 0 type host
    step chooseleaf firstn 2 type osd
    step emit
}

ceph pg ls for pool 3 (SSD):

3.0 184 0 0 0 738070528 0 0 7458 active+clean 9m 926'7458 951:18695 [7,6,5,4]p7 [7,6,5,4]p7 2024-06-05T08:52:51.951936+0200 2024-06-05T08:52:51.951936+0200 3.1 169 0 0 0 667156655 0 0 3242 active+clean 18m 926'3242 951:14151 [4,5,6,7]p4 [4,5,6,7]p4 2024-06-05T08:54:42.948682+0200 2024-06-05T08:54:42.948682+0200 3.2 221 0 0 0 885641968 0 0 6989 active+clean 9m 926'6989 951:17645 [4,5,7,6]p4 [4,5,7,6]p4 2024-06-05T08:53:27.981787+0200 2024-06-05T08:53:27.981787+0200 3.3 180 0 0 0 716509184 0 0 3194 active+clean 9m 926'3194 951:14191 [4,5,6,7]p4 [4,5,6,7]p4 2024-06-05T08:54:29.584216+0200 2024-06-05T08:54:29.584216+0200 3.4 189 0 0 0 754417698 137 8 3616 active+clean 9m 926'3616 951:19245 [6,7,4,5]p6 [6,7,4,5]p6 2024-06-05T08:54:02.307323+0200 2024-06-05T08:54:02.307323+0200 3.5 188 0 0 0 742543377 0 0 5992 active+clean 9m 926'5992 951:18862 [6,7,5,4]p6 [6,7,5,4]p6 2024-06-05T08:53:09.483136+0200 2024-06-05T08:53:09.483136+0200 3.6 191 0 0 0 769482752 150 16 6810 active+clean 9m 926'6810 951:30043 [7,6,4,5]p7 [7,6,4,5]p7 2024-06-05T08:53:46.646517+0200 2024-06-05T08:53:46.646517+0200 3.7 170 0 0 0 681587379 0 0 10081 active+clean 9m 926'21581 951:28473 [4,5,7,6]p4 [4,5,7,6]p4 2024-06-05T08:54:16.047967+0200 2024-06-05T08:54:16.047967+0200


ceph pg ls when the new crush rule is selected for pool SSD:

3.0 184 372 186 0 738070528 0 0 7458 active+recovery_wait+undersized+degraded+remapped 14s 926'7458 955:18688 [1,0,3,7]p1 [4,7]p4 2024-06-05T08:52:51.951936+0200 2024-06-05T08:52:51.951936+0200 3.1 169 0 0 0 667156655 0 0 3242 active+clean 19m 926'3242 954:14154 [4,5,6,7]p4 [4,5,6,7]p4 2024-06-05T08:54:42.948682+0200 2024-06-05T08:54:42.948682+0200 3.2 221 444 0 0 885641968 0 0 6989 active+recovery_wait+undersized+degraded+remapped 14s 926'6989 955:17657 [4,5,3,2]p4 [4,5]p4 2024-06-05T08:53:27.981787+0200 2024-06-05T08:53:27.981787+0200 3.3 180 0 540 0 716509184 0 0 3194 active+recovering+undersized+degraded+remapped 15s 926'3194 955:14204 [4,0,2,3]p4 [4,0]p4 2024-06-05T08:54:29.584216+0200 2024-06-05T08:54:29.584216+0200 3.4 189 378 0 0 754417698 0 0 3616 active+recovery_wait+undersized+degraded+remapped 15s 926'3616 955:19220 [1,4,6,2]p1 [4,6]p4 2024-06-05T08:54:02.307323+0200 2024-06-05T08:54:02.307323+0200 3.5 188 189 0 0 742543377 0 0 5992 active+recovery_wait+undersized+degraded+remapped 15s 926'5992 955:18845 [5,4,6,3]p5 [5,4,6]p5 2024-06-05T08:53:09.483136+0200 2024-06-05T08:53:09.483136+0200 3.6 191 390 195 0 769482752 0 0 6810 active+recovery_wait+undersized+degraded+remapped 14s 926'6810 955:30016 [0,1,2,7]p0 [4,7]p4 2024-06-05T08:53:46.646517+0200 2024-06-05T08:53:46.646517+0200 3.7 170 0 170 0 681587379 0 0 10081 active+remapped+backfill_wait 14s 926'21581 955:28486 [4,5,3,6]p4 [4,5,6,7]p4 2024-06-05T08:54:16.047967+0200 2024-06-05T08:54:16.047967+0200


So CRUSH is able to find a suitable mapping just fine, but somehow Ceph decides to drop up to two healthy copies from its acting se,t and I do not understand why. I would expect only misplaced objects at this point.

Ceph version is 15.2.17. Latest CRUSH tunables (ceph osd crush tunables optimal).

Do I miss something obvious here? If so, would you please point it out to me? :D.

Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux