Re: PG stuck incomplete

Olivier Bonvalet <ceph.list@xxxxxxxxx> · Fri, 21 Sep 2018 14:39:21 +0200

In fact, one object (only one) seem to be blocked on the cache tier
(writeback).

I tried to flush the cache with "rados -p cache-bkp-foo cache-flush-
evict-all", but it blocks on the object
"rbd_data.f66c92ae8944a.00000000000f2596".

So I reduced (a lot) the cache tier to 200MB, "rados -p cache-bkp-foo
ls" now show only 3 objects :

    rbd_directory
    rbd_data.f66c92ae8944a.00000000000f2596
    rbd_header.f66c92ae8944a

And "cache-flush-evict-all" still hangs.

I also switched the cache tier to "readproxy", to avoid using this
cache. But, it's still blocked.

Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet a écrit :
> Hello,
> 
> on a Luminous cluster, I have a PG incomplete and I can't find how to
> fix that.
> 
> It's an EC pool (4+2) :
> 
>     pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing pool
> bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for
> 'incomplete')
> 
> Of course, we can't reduce min_size from 4.
> 
> And the full state : https://pastebin.com/zrwu5X0w
> 
> So, IO are blocked, we can't access thoses damaged data.
> OSD blocks too :
>     osds 32,68,69 have stuck requests > 4194.3 sec
> 
> OSD 32 is the primary of this PG.
> And OSD 68 and 69 are for cache tiering.
> 
> Any idea how can I fix that ?
> 
> Thanks,
> 
> Olivier
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com