Re: 9 PGs stay incomplete

Brad Hubbard <bhubbard@xxxxxxxxxx> · Fri, 11 Sep 2015 04:55:21 -0400 (EDT)

----- Original Message -----
> From: "Wido den Hollander" <wido@xxxxxxxx>
> To: "ceph-users" <ceph-users@xxxxxxxx>
> Sent: Friday, 11 September, 2015 6:46:11 AM
> Subject:  9 PGs stay incomplete
> 
> Hi,
> 
> I'm running into a issue with Ceph 0.94.2/3 where after doing a recovery
> test 9 PGs stay incomplete:
> 
> osdmap e78770: 2294 osds: 2294 up, 2294 in
> pgmap v1972391: 51840 pgs, 7 pools, 220 TB data, 185 Mobjects
>        755 TB used, 14468 TB / 15224 TB avail
>           51831 active+clean
>               9 incomplete
> 
> As you can see, all 2294 OSDs are online and about all PGs became
> active+clean again, except for 9.
> 
> I found out that these PGs are the problem:
> 
> 10.3762
> 7.309e
> 7.29a2
> 10.2289
> 7.17dd
> 10.165a
> 7.1050
> 7.c65
> 10.abf
> 
> Digging further, all the PGs map back to a OSD which is running on the
> same host. 'ceph-stg-01' in this case.
> 
> $ ceph pg 10.3762 query
> 
> Looking at the recovery state, this is shown:
> 
>                 {
>                     "first": 65286,
>                     "last": 67355,
>                     "maybe_went_rw": 0,
>                     "up": [
>                         1420,
>                         854,
>                         1105

Anything interesting in the OSD logs for these OSDs?

>                     ],
>                     "acting": [
>                         1420
>                     ],
>                     "primary": 1420,
>                     "up_primary": 1420
>                 },
> 
> osd.1420 is online. I tried restarting it, but nothing happens, these 9
> PGs stay incomplete.
> 
> Under 'peer_info' info I see both osd.854 and osd.1105 reporting about
> the PG with identical numbers.
> 
> I restarted both 854 and 1105, without result.
> 
> The output of PG query can be found here: http://pastebin.com/qQL699zC
> 
> The cluster is running a mix of 0.94.2 and .3 on Ubuntu 14.04.2 with the
> 3.13 kernel. XFS is being used as the backing filesystem.
> 
> Any suggestions to fix this issue? There is no valuable data in these
> pools, so I can remove them, but I'd rather fix the root-cause.
> 
> --
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
> 
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com