Re: MDS stuck in a crash loop

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 22, 2015 at 1:43 PM, Milosz Tanski <milosz@xxxxxxxxx> wrote:
> On Wed, Oct 21, 2015 at 5:33 PM, John Spray <jspray@xxxxxxxxxx> wrote:
>> On Wed, Oct 21, 2015 at 10:33 PM, John Spray <jspray@xxxxxxxxxx> wrote:
>>>> John, I know you've got
>>>> https://github.com/ceph/ceph-qa-suite/pull/647. I think that's
>>>> supposed to be for this, but I'm not sure if you spotted any issues
>>>> with it or if we need to do some more diagnosing?
>>>
>>> That test path is just verifying that we do handle dirs without dying
>>> in at least one case -- it passes with the existing ceph code, so it's
>>> not reproducing this issue.
>>
>> Clicked send to soon, I was about to add...
>>
>> Milosz mentioned that they don't have the data from the system in the
>> broken state, so I don't have any bright ideas about learning more
>> about what went wrong here unfortunately.
>>
>
> Sorry about that, wasn't thinking at the time and just wanted to get
> this up and going as quickly as possible :(
>
> If this happens next time I'll be more careful to keep more evidence.
> I think multi-fs in the same rados namespace support would actually
> helped here, since it makes it easier to create a newfs and leave the
> other one around (for investigation)

Yep, good point.  I am a known enthusiast for multi-filesystem support :-)

> But makes me wonder that the broken dir scenario can probably be
> replicated by hand using rados calls. There's a pretty generic ticket
> there for don't die on dir errors, but I imagine the code can be
> audited and steps to cause a synthetic error can be produced.

Yes, that part I have done (and will build into the automated tests in
due course) -- the bit that is still a mystery is how the damage
occurred to begin with.

John

>
> --
> Milosz Tanski
> CTO
> 16 East 34th Street, 15th floor
> New York, NY 10016
>
> p: 646-253-9055
> e: milosz@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux