Re: How to remove lost objects.

Andrey Stepachev <octo47@xxxxxxxxx> · Tue, 24 Jan 2012 02:48:37 +0400

2012/1/23 Gregory Farnum <gregory.farnum@xxxxxxxxxxxxx>:
> On Thu, Jan 19, 2012 at 12:36 PM, Andrey Stepachev <octo47@xxxxxxxxx> wrote:
>> 2012/1/19 Gregory Farnum <gregory.farnum@xxxxxxxxxxxxx>:
>>> On Thu, Jan 19, 2012 at 12:53 AM, Andrey Stepachev <octo47@xxxxxxxxx> wrote:
>>>> 2012/1/19 Gregory Farnum <gregory.farnum@xxxxxxxxxxxxx>:
>>>>> On Wednesday, January 18, 2012, Andrey Stepachev <octo47@xxxxxxxxx> wrote:
>>>>>> 2012/1/19 Gregory Farnum <gregory.farnum@xxxxxxxxxxxxx>:
>>>>>>> On Wed, Jan 18, 2012 at 12:48 PM, Andrey Stepachev <octo47@xxxxxxxxx>
>>>>>>> wrote:
>>>>>>>> But still don't know what happens with ceph, so it can't
>>>>>>>> respond and hang. It is not a good behavior, because
>>>>>>>> such situation leads to unresponsible cluster in case of
>>>>>>>> temporal network failure.
>>>>>>>
>>>>>>> I'm a little concerned about this — I would expect to see hangs of up
>>>>>>> to ~30 seconds (the timeout period), but for operations to then
>>>>>>> continue. Are you putting the MDS down? If so, do you have any
>>>>>>> standbys specified?
>>>>>>
>>>>>> Yes, MDS goes down (I restart it at some point, while changing something
>>>>>> in config).
>>>>>> Yes, i have 2 standbys.
>>>>>> Clients hang more then 10 minutes.
>>>>>
>>>>> Okay, so it's probably an issue with the MDS not entering recovery when it
>>>>> should. Are you also taking down one of the monitor nodes? There's a known
>>>>> bug which can cause a standby MDS to wait up to 15 minutes if its monitor
>>>>> goes down which is fixed in latest master (and maybe .40; I'd have to
>>>>> check).
>>>>
>>>> Yes. I have collocated mon mds and osd on some nodes.
>>>> And restart all daemons at once. I use 0.40. (built from my github fork).
>>>
>>> Hrm. I checked and the fix is in 0.40. Can you reproduce this with
>>> client logging enabled (--debug_ms 1 --debug_client 10) and post the
>>> logs somewhere for me to check out? That should be able to isolate the
>>> problem area at least.
>>
>> Client writes "renew caps" and nothing more.
>> I'd try to reproduce problem with more logging, but still no luck.
>> May be debug serializes race somewhere and prevents
>> this bug to occur.
>
> Any updates on this? "renew caps" being the last thing in the log
> doesn't actually mean much, unfortunately. We're going to need logs of
> some description in order to give you any more help.

I've been switched to other urgent task now, so in a week or two
i'll return back to ceph and try to reproduce this hangouts to find out
what is going on.

> -Greg

-- 
Andrey.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html