Re: PG stuck incomplete

Olivier Bonvalet <ceph.list@xxxxxxxxx> · Fri, 21 Sep 2018 13:12:19 +0200

Ok, so it's a replica 3 pool, and OSD 68 & 69 are on the same host.

Le vendredi 21 septembre 2018 à 11:09 +0000, Eugen Block a écrit :
> > cache-tier on this pool have 26GB of data (for 5.7TB of data on the
> > EC
> > pool).
> > We tried to flush the cache tier, and restart OSD 68 & 69, without
> > any
> > success.
> 
> I meant the replication size of the pool
> 
> ceph osd pool ls detail | grep <CACHE_TIER>
> 
> In the experimental state of our cluster we had a cache tier (for
> rbd  
> pool) with size 2, that can cause problems during recovery. Since
> only  
> OSDs 68 and 69 are mentioned I was wondering if your cache tier
> also  
> has size 2.
> 
> 
> Zitat von Olivier Bonvalet <ceph.list@xxxxxxxxx>:
> 
> > Hi,
> > 
> > cache-tier on this pool have 26GB of data (for 5.7TB of data on the
> > EC
> > pool).
> > We tried to flush the cache tier, and restart OSD 68 & 69, without
> > any
> > success.
> > 
> > But I don't see any related data on cache-tier OSD (filestore) with
> > :
> > 
> >     find /var/lib/ceph/osd/ -maxdepth 3 -name '*37.9c*'
> > 
> > 
> > I don't see any usefull information in logs. Maybe I should
> > increase
> > log level ?
> > 
> > Thanks,
> > 
> > Olivier
> > 
> > 
> > Le vendredi 21 septembre 2018 à 09:34 +0000, Eugen Block a écrit :
> > > Hi Olivier,
> > > 
> > > what size does the cache tier have? You could set cache-mode to
> > > forward and flush it, maybe restarting those OSDs (68, 69) helps,
> > > too.
> > > Or there could be an issue with the cache tier, what do those
> > > logs
> > > say?
> > > 
> > > Regards,
> > > Eugen
> > > 
> > > 
> > > Zitat von Olivier Bonvalet <ceph.list@xxxxxxxxx>:
> > > 
> > > > Hello,
> > > > 
> > > > on a Luminous cluster, I have a PG incomplete and I can't find
> > > > how
> > > > to
> > > > fix that.
> > > > 
> > > > It's an EC pool (4+2) :
> > > > 
> > > >     pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing
> > > > pool
> > > > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for
> > > > 'incomplete')
> > > > 
> > > > Of course, we can't reduce min_size from 4.
> > > > 
> > > > And the full state : https://pastebin.com/zrwu5X0w
> > > > 
> > > > So, IO are blocked, we can't access thoses damaged data.
> > > > OSD blocks too :
> > > >     osds 32,68,69 have stuck requests > 4194.3 sec
> > > > 
> > > > OSD 32 is the primary of this PG.
> > > > And OSD 68 and 69 are for cache tiering.
> > > > 
> > > > Any idea how can I fix that ?
> > > > 
> > > > Thanks,
> > > > 
> > > > Olivier
> > > > 
> > > > 
> > > > _______________________________________________
> > > > ceph-users mailing list
> > > > ceph-users@xxxxxxxxxxxxxx
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@xxxxxxxxxxxxxx
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > 
> 
> 
> 
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com