Cool, writing some objects to the affected PGs has stopped the consistent/inconsistent cycle. I'll keep an eye on them but this seems to have fixed the problem. Thanks!! Dan On Wed, Jul 22, 2015 at 6:07 PM, Samuel Just <sjust@xxxxxxxxxx> wrote: > Looks like it's just a stat error. The primary appears to have the correct stats, but the replica for some reason doesn't (thinks there's an object for some reason). I bet it clears itself it you perform a write on the pg since the primary will send over its stats. We'd need information from when the stat error originally occurred to debug further. > -Sam > > ----- Original Message ----- > From: "Dan van der Ster" <dan@xxxxxxxxxxxxxx> > To: ceph-users@xxxxxxxxxxxxxx > Sent: Wednesday, July 22, 2015 7:49:00 AM > Subject: PGs going inconsistent after stopping the primary > > Hi Ceph community, > > Env: hammer 0.94.2, Scientific Linux 6.6, kernel 2.6.32-431.5.1.el6.x86_64 > > We wanted to post here before the tracker to see if someone else has > had this problem. > > We have a few PGs (different pools) which get marked inconsistent when > we stop the primary OSD. The problem is strange because once we > restart the primary, then scrub the PG, the PG is marked active+clean. > But inevitably next time we stop the primary OSD, the same PG is > marked inconsistent again. > > There is no user activity on this PG, and nothing interesting is > logged in any of the 2nd/3rd OSDs (with debug_osd=20, the first line > mentioning the PG already says inactive+inconsistent). > > > We suspect this is related to garbage files left in the PG folder. One > of our PGs is acting basically like above, except it goes through this > cycle: active+clean -> (deep-scrub) -> active+clean+inconsistent -> > (repair) -> active+clean -> (restart primary OSD) -> (deep-scrub) -> > active+clean+inconsistent. This one at least logs: > > 2015-07-22 16:42:41.821326 osd.303 [INF] 55.10d deep-scrub starts > 2015-07-22 16:42:41.823834 osd.303 [ERR] 55.10d deep-scrub stat > mismatch, got 0/1 objects, 0/0 clones, 0/1 dirty, 0/0 omap, 0/0 > hit_set_archive, 0/0 whiteouts, 0/0 bytes,0/0 hit_set_archive bytes. > 2015-07-22 16:42:41.823842 osd.303 [ERR] 55.10d deep-scrub 1 errors > > and this should be debuggable because there is only one object in the pool: > > tapetest 55 0 0 73575G 1 > > even though rados ls returns no objects: > > # rados ls -p tapetest > # > > Any ideas? > > Cheers, Dan > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com