Re: PG stuck incomplete

Olivier Bonvalet <ceph.list@xxxxxxxxx> · Fri, 21 Sep 2018 16:18:31 +0200

Yep :

pool 38 'cache-bkp-foo' replicated size 3 min_size 2 crush_rule 26
object_hash rjenkins pg_num 128 pgp_num 128 last_change 585369 lfor
68255/68255 flags hashpspool,incomplete_clones tier_of 37 cache_mode
readproxy target_bytes 209715200 hit_set
bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 300s
x2 decay_rate 0 search_last_n 0 min_read_recency_for_promote 10
min_write_recency_for_promote 2 stripe_width 0

I can't totally disable the cache tiering, because OSD are in filestore
(so without "overwrites" feature).

Le vendredi 21 septembre 2018 à 13:26 +0000, Eugen Block a écrit :
> > I also switched the cache tier to "readproxy", to avoid using this
> > cache. But, it's still blocked.
> 
> You could change the cache mode to "none" to disable it. Could you  
> paste the output of:
> 
> ceph osd pool ls detail | grep cache-bkp-foo
> 
> 
> 
> Zitat von Olivier Bonvalet <ceph.list@xxxxxxxxx>:
> 
> > In fact, one object (only one) seem to be blocked on the cache tier
> > (writeback).
> > 
> > I tried to flush the cache with "rados -p cache-bkp-foo cache-
> > flush-
> > evict-all", but it blocks on the object
> > "rbd_data.f66c92ae8944a.00000000000f2596".
> > 
> > So I reduced (a lot) the cache tier to 200MB, "rados -p cache-bkp-
> > foo
> > ls" now show only 3 objects :
> > 
> >     rbd_directory
> >     rbd_data.f66c92ae8944a.00000000000f2596
> >     rbd_header.f66c92ae8944a
> > 
> > And "cache-flush-evict-all" still hangs.
> > 
> > I also switched the cache tier to "readproxy", to avoid using this
> > cache. But, it's still blocked.
> > 
> > 
> > 
> > 
> > Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet a
> > écrit :
> > > Hello,
> > > 
> > > on a Luminous cluster, I have a PG incomplete and I can't find
> > > how to
> > > fix that.
> > > 
> > > It's an EC pool (4+2) :
> > > 
> > >     pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing
> > > pool
> > > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for
> > > 'incomplete')
> > > 
> > > Of course, we can't reduce min_size from 4.
> > > 
> > > And the full state : https://pastebin.com/zrwu5X0w
> > > 
> > > So, IO are blocked, we can't access thoses damaged data.
> > > OSD blocks too :
> > >     osds 32,68,69 have stuck requests > 4194.3 sec
> > > 
> > > OSD 32 is the primary of this PG.
> > > And OSD 68 and 69 are for cache tiering.
> > > 
> > > Any idea how can I fix that ?
> > > 
> > > Thanks,
> > > 
> > > Olivier
> > > 
> > > 
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@xxxxxxxxxxxxxx
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > 
> > 
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com