Re: MDS stuck in a crash loop

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 22, 2015 at 8:48 AM, John Spray <jspray@xxxxxxxxxx> wrote:
> On Thu, Oct 22, 2015 at 1:43 PM, Milosz Tanski <milosz@xxxxxxxxx> wrote:
>> On Wed, Oct 21, 2015 at 5:33 PM, John Spray <jspray@xxxxxxxxxx> wrote:
>>> On Wed, Oct 21, 2015 at 10:33 PM, John Spray <jspray@xxxxxxxxxx> wrote:
>>>>> John, I know you've got
>>>>> https://github.com/ceph/ceph-qa-suite/pull/647. I think that's
>>>>> supposed to be for this, but I'm not sure if you spotted any issues
>>>>> with it or if we need to do some more diagnosing?
>>>>
>>>> That test path is just verifying that we do handle dirs without dying
>>>> in at least one case -- it passes with the existing ceph code, so it's
>>>> not reproducing this issue.
>>>
>>> Clicked send to soon, I was about to add...
>>>
>>> Milosz mentioned that they don't have the data from the system in the
>>> broken state, so I don't have any bright ideas about learning more
>>> about what went wrong here unfortunately.
>>>
>>
>> Sorry about that, wasn't thinking at the time and just wanted to get
>> this up and going as quickly as possible :(
>>
>> If this happens next time I'll be more careful to keep more evidence.
>> I think multi-fs in the same rados namespace support would actually
>> helped here, since it makes it easier to create a newfs and leave the
>> other one around (for investigation)
>
> Yep, good point.  I am a known enthusiast for multi-filesystem support :-)
>
>> But makes me wonder that the broken dir scenario can probably be
>> replicated by hand using rados calls. There's a pretty generic ticket
>> there for don't die on dir errors, but I imagine the code can be
>> audited and steps to cause a synthetic error can be produced.
>
> Yes, that part I have done (and will build into the automated tests in
> due course) -- the bit that is still a mystery is how the damage
> occurred to begin with.

John, my money is on me somehow fumbling the recovery process. And,
without the bash history falling off I'm going to assume that.

-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: milosz@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux