Re: All PGs are active+clean, still remapped PGs

Sage Weil <sage@xxxxxxxxxxxx> · Wed, 2 Nov 2016 15:21:08 +0000 (UTC)

On Wed, 2 Nov 2016, Wido den Hollander wrote:
> > > > I'm pretty sure this is a race condition that got cleaned up as part of 
> > > > https://github.com/ceph/ceph/pull/9078/commits.  The mon only checks the 
> > > > pg_temp entries that are getting set/changed, and since those are already 
> > > > in place it doesn't recheck them.  Any poke to the cluster that triggers 
> > > > peering ought to be enough to clear it up.  So, no need for logs, thanks!
> > > > 
> > > 
> > > Ok, just checking.
> > > 
> > > > We could add a special check during, say, upgrade, but generally the PGs 
> > > > will re-peer as the OSDs restart anyway and that will clear it up.
> > > > 
> > > > Maybe you can just confirm that marking an osd down (say, ceph osd down 
> > > > 31) is also enough to remove the stray entry?
> > > > 
> > > 
> > > I already tried a restart of the OSDs, but that didn't work. I marked osd 31, 160 and 138 as down for PG 4.862 but that didn't work:
> > > 
> > > pg_temp 4.862 [31,160,138,2]
> > > 
> > > But this works:
> > > 
> > > root@mon1:~# ceph osd dump|grep pg_temp
> > > pg_temp 4.862 [31,160,138,2]
> > > pg_temp 4.a83 [156,83,10,7]
> > > pg_temp 4.e8e [164,78,10,8]
> > > root@mon1:~# ceph osd pg-temp 4.862 31
> > > set 4.862 pg_temp mapping to [31]
> > > root@mon1:~# ceph osd dump|grep pg_temp
> > > pg_temp 4.a83 [156,83,10,7]
> > > pg_temp 4.e8e [164,78,10,8]
> > > root@mon1:~#
> > > 
> > > So the restarts nor the marking down fixed the issue. Only the pg-temp trick.
> > > 
> > > Still have two PGs left which I can test with.
> > 
> > Hmm.  Did you leave the OSD down long enough for the PG to peer without 
> > it?  Can you confirm that doesn't work?
> > 
> 
> I stopped osd.31, waited for all PGs to re-peer, waited another minute or so and started it again, that didn't work. The pg_temp wasn't resolved.
> 
> The whole cluster runs 0.94.9

Hrmpf.  Well, I guess that means a special case on upgrade would be 
helpful.  Not convinced it's the most important thing though, given this 
is probably a pretty rare case and can be fixed manually.  (OTOH, most 
operators won't know that...)

sage
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com