Hi all,
We sometimes can observe that acting set seems to violate crush rule. For example, we had an environment before:
[root@Ann-per-R7-3 /]# ceph -s cluster: id: 248ce880-f57b-4a4c-a53a-3fc2b3eb142a health: HEALTH_WARN 34/8019 objects misplaced (0.424%) services: mon: 3 daemons, quorum Ann-per-R7-3,Ann-per-R7-7,Ann-per-R7-1 mgr: Ann-per-R7-3(active), standbys: Ann-per-R7-7, Ann-per-R7-1 mds: cephfs-1/1/1 up {0=qceph-mds-Ann-per-R7-1=up:active}, 2 up:standby osd: 7 osds: 7 up, 7 in; 1 remapped pgs data: pools: 7 pools, 128 pgs objects: 2.67 k objects, 10 GiB usage: 107 GiB used, 3.1 TiB / 3.2 TiB avail pgs: 34/8019 objects misplaced (0.424%) 127 active+clean 1 active+clean+remapped [root@Ann-per-R7-3 /]# ceph pg ls remapped PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES LOG STATE STATE_STAMP VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP 1.7 34 0 34 0 134217728 42 active+clean+remapped 2019-11-05 10:39:58.639533 144'42 229:407 [6,1]p6 [6,1,2]p6 2019-11-04 10:36:19.519820 2019-11-04 10:36:19.519820 [root@Ann-per-R7-3 /]# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -2 0 root perf_osd -1 3.10864 root default -7 0.44409 host Ann-per-R7-1 5 hdd 0.44409 osd.5 up 1.00000 1.00000 -3 1.33228 host Ann-per-R7-3 0 hdd 0.44409 osd.0 up 1.00000 1.00000 1 hdd 0.44409 osd.1 up 1.00000 1.00000 2 hdd 0.44409 osd.2 up 1.00000 1.00000 -9 1.33228 host Ann-per-R7-7 6 hdd 0.44409 osd.6 up 1.00000 1.00000 7 hdd 0.44409 osd.7 up 1.00000 1.00000 8 hdd 0.44409 osd.8 up 1.00000 1.00000 [root@Ann-per-R7-3 /]# ceph osd df ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 5 hdd 0.44409 1.00000 465 GiB 21 GiB 444 GiB 4.49 1.36 127 0 hdd 0.44409 1.00000 465 GiB 15 GiB 450 GiB 3.16 0.96 44 1 hdd 0.44409 1.00000 465 GiB 15 GiB 450 GiB 3.14 0.95 52 2 hdd 0.44409 1.00000 465 GiB 14 GiB 451 GiB 2.98 0.91 33 6 hdd 0.44409 1.00000 465 GiB 14 GiB 451 GiB 2.97 0.90 43 7 hdd 0.44409 1.00000 465 GiB 15 GiB 450 GiB 3.19 0.97 41 8 hdd 0.44409 1.00000 465 GiB 14 GiB 450 GiB 3.09 0.94 44 TOTAL 3.2 TiB 107 GiB 3.1 TiB 3.29 MIN/MAX VAR: 0.90/1.36 STDDEV: 0.49
Based on our crush map, crush rule should select 1 OSD from each host. However, from above log, we can see that an acting set is [6,1,2] and osd.1 and osd.2 are in the same host, which seems to violate crush rule. So, my question is how does this happen...? Any enlightenment is much appreciated.
Best
Cian
Best
Cian
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com