More to my curiosity on this. Our clusters leave behind /var/lib/ceph/osd/ceph-##/current/pg_temp folders on occasion. if you check all of the pg_temp folders for osd.10, you
might find something that's holding onto the pg even if it's really moved on.
From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of David Turner [david.turner@xxxxxxxxxxxxxxxx]
Sent: Monday, October 24, 2016 2:24 PM To: Wido den Hollander; ceph-users@xxxxxxxx Subject: Re: [ceph-users] All PGs are active+clean, still remapped PGs Are you running a replica size of 4? If not, these might be errantly being reported as being on 10.
________________________________________
From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Wido den Hollander [wido@xxxxxxxx] Sent: Monday, October 24, 2016 2:19 PM To: ceph-users@xxxxxxxx Subject: [ceph-users] All PGs are active+clean, still remapped PGs Hi, On a cluster running Hammer 0.94.9 (upgraded from Firefly) I have 29 remapped PGs according to the OSDMap, but all PGs are active+clean. osdmap e111208: 171 osds: 166 up, 166 in; 29 remapped pgs pgmap v101069070: 6144 pgs, 2 pools, 90122 GB data, 22787 kobjects 264 TB used, 184 TB / 448 TB avail 6144 active+clean The OSDMap shows: root@mon1:~# ceph osd dump|grep pg_temp pg_temp 4.39 [160,17,10,8] pg_temp 4.52 [161,16,10,11] pg_temp 4.8b [166,29,10,7] pg_temp 4.b1 [5,162,148,2] pg_temp 4.168 [95,59,6,2] pg_temp 4.1ef [22,162,10,5] pg_temp 4.2c9 [164,95,10,7] pg_temp 4.330 [165,154,10,8] pg_temp 4.353 [2,33,18,54] pg_temp 4.3f8 [88,67,10,7] pg_temp 4.41a [30,59,10,5] pg_temp 4.45f [47,156,21,2] pg_temp 4.486 [138,43,10,7] pg_temp 4.674 [59,18,7,2] pg_temp 4.7b8 [164,68,10,11] pg_temp 4.816 [167,147,57,2] pg_temp 4.829 [82,45,10,11] pg_temp 4.843 [141,34,10,6] pg_temp 4.862 [31,160,138,2] pg_temp 4.868 [78,67,10,5] pg_temp 4.9ca [150,68,10,8] pg_temp 4.a83 [156,83,10,7] pg_temp 4.a98 [161,94,10,7] pg_temp 4.b80 [162,88,10,8] pg_temp 4.d41 [163,52,10,6] pg_temp 4.d54 [149,140,10,7] pg_temp 4.e8e [164,78,10,8] pg_temp 4.f2a [150,68,10,6] pg_temp 4.ff3 [30,157,10,7] root@mon1:~# So I tried to restart osd.160 and osd.161, but that didn't chance the state. root@mon1:~# ceph pg 4.39 query { "state": "active+clean", "snap_trimq": "[]", "epoch": 111212, "up": [ 160, 17, 8 ], "acting": [ 160, 17, 8 ], "actingbackfill": [ "8", "17", "160" ], In all these PGs osd.10 is involved, but that OSD is down and out. I tried marking it as down again, but that didn't work. I haven't tried removing osd.10 yet from the CRUSHMap since that will trigger a rather large rebalance. This cluster is still running with the Dumpling tunables though, so that might be the issue. But before I trigger a very large rebalance I wanted to check if there are any insights on this one. Thanks, Wido _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com