Re: All PGs are active+clean, still remapped PGs

Wido den Hollander <wido@xxxxxxxx> · Tue, 25 Oct 2016 07:18:17 +0200 (CEST)

> Op 24 oktober 2016 om 22:41 schreef David Turner <david.turner@xxxxxxxxxxxxxxxx>:
> 
> 
> More to my curiosity on this.  Our clusters leave behind /var/lib/ceph/osd/ceph-##/current/pg_temp folders on occasion.  if you check all of the pg_temp folders for osd.10, you might find something that's holding onto the pg even if it's really moved on.
> 

Thanks, but osd.10 is already down and out. The disk has been broken for a while now.

Wido

> ________________________________
> 
> [cid:image8c937b.JPG@aa1a4c35.419d1b46]<https://storagecraft.com>       David Turner | Cloud Operations Engineer | StorageCraft Technology Corporation<https://storagecraft.com>
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2760 | Mobile: 385.224.2943
> 
> ________________________________
> 
> If you are not the intended recipient of this message or received it erroneously, please notify the sender and delete it, together with any attachments, and be advised that any dissemination or copying of this message is prohibited.
> 
> ________________________________
> 
> ________________________________
> From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of David Turner [david.turner@xxxxxxxxxxxxxxxx]
> Sent: Monday, October 24, 2016 2:24 PM
> To: Wido den Hollander; ceph-users@xxxxxxxx
> Subject: Re:  All PGs are active+clean, still remapped PGs
> 
> 
> Are you running a replica size of 4?  If not, these might be errantly being reported as being on 10.
> 
> ________________________________
> 
> [cid:imagedfab80.JPG@a622f997.4d830ea4]<https://storagecraft.com>       David Turner | Cloud Operations Engineer | StorageCraft Technology Corporation<https://storagecraft.com>
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2760 | Mobile: 385.224.2943
> 
> ________________________________
> 
> If you are not the intended recipient of this message or received it erroneously, please notify the sender and delete it, together with any attachments, and be advised that any dissemination or copying of this message is prohibited.
> 
> ________________________________
> 
> ________________________________________
> From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Wido den Hollander [wido@xxxxxxxx]
> Sent: Monday, October 24, 2016 2:19 PM
> To: ceph-users@xxxxxxxx
> Subject:  All PGs are active+clean, still remapped PGs
> 
> Hi,
> 
> On a cluster running Hammer 0.94.9 (upgraded from Firefly) I have 29 remapped PGs according to the OSDMap, but all PGs are active+clean.
> 
> osdmap e111208: 171 osds: 166 up, 166 in; 29 remapped pgs
> 
> pgmap v101069070: 6144 pgs, 2 pools, 90122 GB data, 22787 kobjects
>    264 TB used, 184 TB / 448 TB avail
>        6144 active+clean
> 
> The OSDMap shows:
> 
> root@mon1:~# ceph osd dump|grep pg_temp
> pg_temp 4.39 [160,17,10,8]
> pg_temp 4.52 [161,16,10,11]
> pg_temp 4.8b [166,29,10,7]
> pg_temp 4.b1 [5,162,148,2]
> pg_temp 4.168 [95,59,6,2]
> pg_temp 4.1ef [22,162,10,5]
> pg_temp 4.2c9 [164,95,10,7]
> pg_temp 4.330 [165,154,10,8]
> pg_temp 4.353 [2,33,18,54]
> pg_temp 4.3f8 [88,67,10,7]
> pg_temp 4.41a [30,59,10,5]
> pg_temp 4.45f [47,156,21,2]
> pg_temp 4.486 [138,43,10,7]
> pg_temp 4.674 [59,18,7,2]
> pg_temp 4.7b8 [164,68,10,11]
> pg_temp 4.816 [167,147,57,2]
> pg_temp 4.829 [82,45,10,11]
> pg_temp 4.843 [141,34,10,6]
> pg_temp 4.862 [31,160,138,2]
> pg_temp 4.868 [78,67,10,5]
> pg_temp 4.9ca [150,68,10,8]
> pg_temp 4.a83 [156,83,10,7]
> pg_temp 4.a98 [161,94,10,7]
> pg_temp 4.b80 [162,88,10,8]
> pg_temp 4.d41 [163,52,10,6]
> pg_temp 4.d54 [149,140,10,7]
> pg_temp 4.e8e [164,78,10,8]
> pg_temp 4.f2a [150,68,10,6]
> pg_temp 4.ff3 [30,157,10,7]
> root@mon1:~#
> 
> So I tried to restart osd.160 and osd.161, but that didn't chance the state.
> 
> root@mon1:~# ceph pg 4.39 query
> {
>    "state": "active+clean",
>    "snap_trimq": "[]",
>    "epoch": 111212,
>    "up": [
>        160,
>        17,
>        8
>    ],
>    "acting": [
>        160,
>        17,
>        8
>    ],
>    "actingbackfill": [
>        "8",
>        "17",
>        "160"
>    ],
> 
> In all these PGs osd.10 is involved, but that OSD is down and out. I tried marking it as down again, but that didn't work.
> 
> I haven't tried removing osd.10 yet from the CRUSHMap since that will trigger a rather large rebalance.
> 
> This cluster is still running with the Dumpling tunables though, so that might be the issue. But before I trigger a very large rebalance I wanted to check if there are any insights on this one.
> 
> Thanks,
> 
> Wido
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com