Re: PGs going inconsistent after stopping the primary

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Cool, writing some objects to the affected PGs has stopped the
consistent/inconsistent cycle. I'll keep an eye on them but this seems
to have fixed the problem.
Thanks!!
Dan

On Wed, Jul 22, 2015 at 6:07 PM, Samuel Just <sjust@xxxxxxxxxx> wrote:
> Looks like it's just a stat error.  The primary appears to have the correct stats, but the replica for some reason doesn't (thinks there's an object for some reason).  I bet it clears itself it you perform a write on the pg since the primary will send over its stats.  We'd need information from when the stat error originally occurred to debug further.
> -Sam
>
> ----- Original Message -----
> From: "Dan van der Ster" <dan@xxxxxxxxxxxxxx>
> To: ceph-users@xxxxxxxxxxxxxx
> Sent: Wednesday, July 22, 2015 7:49:00 AM
> Subject:  PGs going inconsistent after stopping the primary
>
> Hi Ceph community,
>
> Env: hammer 0.94.2, Scientific Linux 6.6, kernel 2.6.32-431.5.1.el6.x86_64
>
> We wanted to post here before the tracker to see if someone else has
> had this problem.
>
> We have a few PGs (different pools) which get marked inconsistent when
> we stop the primary OSD. The problem is strange because once we
> restart the primary, then scrub the PG, the PG is marked active+clean.
> But inevitably next time we stop the primary OSD, the same PG is
> marked inconsistent again.
>
> There is no user activity on this PG, and nothing interesting is
> logged in any of the 2nd/3rd OSDs (with debug_osd=20, the first line
> mentioning the PG already says inactive+inconsistent).
>
>
> We suspect this is related to garbage files left in the PG folder. One
> of our PGs is acting basically like above, except it goes through this
> cycle: active+clean -> (deep-scrub) -> active+clean+inconsistent ->
> (repair) -> active+clean -> (restart primary OSD) -> (deep-scrub) ->
> active+clean+inconsistent. This one at least logs:
>
> 2015-07-22 16:42:41.821326 osd.303 [INF] 55.10d deep-scrub starts
> 2015-07-22 16:42:41.823834 osd.303 [ERR] 55.10d deep-scrub stat
> mismatch, got 0/1 objects, 0/0 clones, 0/1 dirty, 0/0 omap, 0/0
> hit_set_archive, 0/0 whiteouts, 0/0 bytes,0/0 hit_set_archive bytes.
> 2015-07-22 16:42:41.823842 osd.303 [ERR] 55.10d deep-scrub 1 errors
>
> and this should be debuggable because there is only one object in the pool:
>
>     tapetest               55           0         0        73575G           1
>
> even though rados ls returns no objects:
>
> # rados ls -p tapetest
> #
>
> Any ideas?
>
> Cheers, Dan
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux