Re: All PGs are active+clean, still remapped PGs

Wido den Hollander <wido@xxxxxxxx> · Wed, 2 Nov 2016 16:25:38 +0100 (CET)

> Op 2 november 2016 om 16:21 schreef Sage Weil <sage@xxxxxxxxxxxx>:
> 
> 
> On Wed, 2 Nov 2016, Wido den Hollander wrote:
> > > > > I'm pretty sure this is a race condition that got cleaned up as part of 
> > > > > https://github.com/ceph/ceph/pull/9078/commits.  The mon only checks the 
> > > > > pg_temp entries that are getting set/changed, and since those are already 
> > > > > in place it doesn't recheck them.  Any poke to the cluster that triggers 
> > > > > peering ought to be enough to clear it up.  So, no need for logs, thanks!
> > > > > 
> > > > 
> > > > Ok, just checking.
> > > > 
> > > > > We could add a special check during, say, upgrade, but generally the PGs 
> > > > > will re-peer as the OSDs restart anyway and that will clear it up.
> > > > > 
> > > > > Maybe you can just confirm that marking an osd down (say, ceph osd down 
> > > > > 31) is also enough to remove the stray entry?
> > > > > 
> > > > 
> > > > I already tried a restart of the OSDs, but that didn't work. I marked osd 31, 160 and 138 as down for PG 4.862 but that didn't work:
> > > > 
> > > > pg_temp 4.862 [31,160,138,2]
> > > > 
> > > > But this works:
> > > > 
> > > > root@mon1:~# ceph osd dump|grep pg_temp
> > > > pg_temp 4.862 [31,160,138,2]
> > > > pg_temp 4.a83 [156,83,10,7]
> > > > pg_temp 4.e8e [164,78,10,8]
> > > > root@mon1:~# ceph osd pg-temp 4.862 31
> > > > set 4.862 pg_temp mapping to [31]
> > > > root@mon1:~# ceph osd dump|grep pg_temp
> > > > pg_temp 4.a83 [156,83,10,7]
> > > > pg_temp 4.e8e [164,78,10,8]
> > > > root@mon1:~#
> > > > 
> > > > So the restarts nor the marking down fixed the issue. Only the pg-temp trick.
> > > > 
> > > > Still have two PGs left which I can test with.
> > > 
> > > Hmm.  Did you leave the OSD down long enough for the PG to peer without 
> > > it?  Can you confirm that doesn't work?
> > > 
> > 
> > I stopped osd.31, waited for all PGs to re-peer, waited another minute or so and started it again, that didn't work. The pg_temp wasn't resolved.
> > 
> > The whole cluster runs 0.94.9
> 
> Hrmpf.  Well, I guess that means a special case on upgrade would be 
> helpful.  Not convinced it's the most important thing though, given this 
> is probably a pretty rare case and can be fixed manually.  (OTOH, most 
> operators won't know that...)
> 

Yes, I think so. It's on the ML now so search machines can find it if needed!

Fixing the PGs now manually so that the MON stores can start to trim.

Wido

> sage
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com