On Thu, 2 Oct 2014, Dmitry Smirnov wrote: > On Sun, 21 Sep 2014 19:01:52 Sage Weil wrote: > > This is one we have never seen in our QA environment, and no real leads. > > I'm much surprised about this... Is it really that unusual to use replicated > caching pool in front of RBD erasure pool? All my OSDs are Btrfs-based and > recently I've upgraded all kernels (i.e. kernel RBD clients) to 3.16.3. My guess is a btrfs issue. The weird thing about your report is the byte totals are off by an uneven number of bytes (3 bytes, 9 bytes, etc.). We haven't ever seen this. We do test RBD over cache tiers on btrfs, but not with EC on the base. I'll add that combo to the matrix. My first guess is a btrfs issue, honestly. > Unlike some shifty issues that may be hard to replicate this particular one > was very persistent and noticeable, no effort to reproduce at all. I've been > observing it for several months already... Does it continue to come up after the kernels are upgraded (and after a full cycle of scrub and repairs have been done to clear out inconsistencies introduced while running the older kernel)? sage > It is unlikely that I have anything special in my v0.80.5 cluster's > configuration... > > > There are a couple slightly different scrub issues that pop up > > occasionally that we are trying to nail down, but this one is a bit > > different. Being able to reliably reproduce it and generate logs is the > > usual strategy... > > Please advise what kind of logs could be useful. Something like "(debug ms = > 1, debug osd = 20)" from primary OSD where inconsistent PG lies at a time when > "scrub" command is given? > > Thanks. > > -- > All the best, > Dmitry Smirnov. > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html