Re: issue #8752 (inconsistent PGs on RBD caching pool)

Dmitry Smirnov <onlyjob@xxxxxxxxxx> · Thu, 02 Oct 2014 23:47:13 +1000

On Sun, 21 Sep 2014 19:01:52 Sage Weil wrote:
> This is one we have never seen in our QA environment, and no real leads.

I'm much surprised about this... Is it really that unusual to use replicated 
caching pool in front of RBD erasure pool? All my OSDs are Btrfs-based and 
recently I've upgraded all kernels (i.e. kernel RBD clients) to 3.16.3.

Unlike some shifty issues that may be hard to replicate this particular one 
was very persistent and noticeable, no effort to reproduce at all. I've been  
observing it for several months already...

It is unlikely that I have anything special in my v0.80.5 cluster's 
configuration...

> There are a couple slightly different scrub issues that pop up
> occasionally that we are trying to nail down, but this one is a bit
> different.  Being able to reliably reproduce it and generate logs is the
> usual strategy...

Please advise what kind of logs could be useful. Something like "(debug ms = 
1, debug osd = 20)" from primary OSD where inconsistent PG lies at a time when 
"scrub" command is given?

Thanks.

-- 
All the best,
 Dmitry Smirnov.
Attachment:
signature.asc

Description: This is a digitally signed message part.