On Thu, Oct 22, 2015 at 8:48 AM, John Spray <jspray@xxxxxxxxxx> wrote: > On Thu, Oct 22, 2015 at 1:43 PM, Milosz Tanski <milosz@xxxxxxxxx> wrote: >> On Wed, Oct 21, 2015 at 5:33 PM, John Spray <jspray@xxxxxxxxxx> wrote: >>> On Wed, Oct 21, 2015 at 10:33 PM, John Spray <jspray@xxxxxxxxxx> wrote: >>>>> John, I know you've got >>>>> https://github.com/ceph/ceph-qa-suite/pull/647. I think that's >>>>> supposed to be for this, but I'm not sure if you spotted any issues >>>>> with it or if we need to do some more diagnosing? >>>> >>>> That test path is just verifying that we do handle dirs without dying >>>> in at least one case -- it passes with the existing ceph code, so it's >>>> not reproducing this issue. >>> >>> Clicked send to soon, I was about to add... >>> >>> Milosz mentioned that they don't have the data from the system in the >>> broken state, so I don't have any bright ideas about learning more >>> about what went wrong here unfortunately. >>> >> >> Sorry about that, wasn't thinking at the time and just wanted to get >> this up and going as quickly as possible :( >> >> If this happens next time I'll be more careful to keep more evidence. >> I think multi-fs in the same rados namespace support would actually >> helped here, since it makes it easier to create a newfs and leave the >> other one around (for investigation) > > Yep, good point. I am a known enthusiast for multi-filesystem support :-) > >> But makes me wonder that the broken dir scenario can probably be >> replicated by hand using rados calls. There's a pretty generic ticket >> there for don't die on dir errors, but I imagine the code can be >> audited and steps to cause a synthetic error can be produced. > > Yes, that part I have done (and will build into the automated tests in > due course) -- the bit that is still a mystery is how the damage > occurred to begin with. John, my money is on me somehow fumbling the recovery process. And, without the bash history falling off I'm going to assume that. -- Milosz Tanski CTO 16 East 34th Street, 15th floor New York, NY 10016 p: 646-253-9055 e: milosz@xxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html