Re: still recovery issues with cuttlefish

Samuel Just <sam.just@xxxxxxxxxxx> · Tue, 13 Aug 2013 16:11:46 -0700



I'm not sure, but your logs did show that you had >16 recovery ops in
flight, so it's worth a try.  If it doesn't help, you should collect
the same set of logs I'll look again.  Also, there are a few other
patches between 61.7 and current cuttlefish which may help.
-Sam

On Tue, Aug 13, 2013 at 2:03 PM, Stefan Priebe - Profihost AG
<s.priebe@xxxxxxxxxxxx> wrote:
>
> Am 13.08.2013 um 22:43 schrieb Samuel Just <sam.just@xxxxxxxxxxx>:
>
>> I just backported a couple of patches from next to fix a bug where we
>> weren't respecting the osd_recovery_max_active config in some cases
>> (1ea6b56170fc9e223e7c30635db02fa2ad8f4b4e).  You can either try the
>> current cuttlefish branch or wait for a 61.8 release.
>
> Thanks! Are you sure that this is the issue? I don't believe that but i'll give it a try. I already tested a branch from sage where he fixed a race regarding max active some weeks ago. So active recovering was max 1 but the issue didn't went away.
>
> Stefan
>
>> -Sam
>>
>> On Mon, Aug 12, 2013 at 10:34 PM, Samuel Just <sam.just@xxxxxxxxxxx> wrote:
>>> I got swamped today.  I should be able to look tomorrow.  Sorry!
>>> -Sam
>>>
>>> On Mon, Aug 12, 2013 at 9:39 PM, Stefan Priebe - Profihost AG
>>> <s.priebe@xxxxxxxxxxxx> wrote:
>>>> Did you take a look?
>>>>
>>>> Stefan
>>>>
>>>> Am 11.08.2013 um 05:50 schrieb Samuel Just <sam.just@xxxxxxxxxxx>:
>>>>
>>>>> Great!  I'll take a look on Monday.
>>>>> -Sam
>>>>>
>>>>> On Sat, Aug 10, 2013 at 12:08 PM, Stefan Priebe <s.priebe@xxxxxxxxxxxx> wrote:
>>>>>> Hi Samual,
>>>>>>
>>>>>> Am 09.08.2013 23:44, schrieb Samuel Just:
>>>>>>
>>>>>>> I think Stefan's problem is probably distinct from Mike's.
>>>>>>>
>>>>>>> Stefan: Can you reproduce the problem with
>>>>>>>
>>>>>>> debug osd = 20
>>>>>>> debug filestore = 20
>>>>>>> debug ms = 1
>>>>>>> debug optracker = 20
>>>>>>>
>>>>>>> on a few osds (including the restarted osd), and upload those osd logs
>>>>>>> along with the ceph.log from before killing the osd until after the
>>>>>>> cluster becomes clean again?
>>>>>>
>>>>>>
>>>>>> done - you'll find the logs at cephdrop folder:
>>>>>> slow_requests_recovering_cuttlefish
>>>>>>
>>>>>> osd.52 was the one recovering
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Greets,
>>>>>> Stefan
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html