Re: still recovery issues with cuttlefish

Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> · Wed, 14 Aug 2013 09:04:16 +0200

the same problem still occours. Will need to check when i've time to
gather logs again.

Am 14.08.2013 01:11, schrieb Samuel Just:
> I'm not sure, but your logs did show that you had >16 recovery ops in
> flight, so it's worth a try.  If it doesn't help, you should collect
> the same set of logs I'll look again.  Also, there are a few other
> patches between 61.7 and current cuttlefish which may help.
> -Sam
> 
> On Tue, Aug 13, 2013 at 2:03 PM, Stefan Priebe - Profihost AG
> <s.priebe@xxxxxxxxxxxx> wrote:
>>
>> Am 13.08.2013 um 22:43 schrieb Samuel Just <sam.just@xxxxxxxxxxx>:
>>
>>> I just backported a couple of patches from next to fix a bug where we
>>> weren't respecting the osd_recovery_max_active config in some cases
>>> (1ea6b56170fc9e223e7c30635db02fa2ad8f4b4e).  You can either try the
>>> current cuttlefish branch or wait for a 61.8 release.
>>
>> Thanks! Are you sure that this is the issue? I don't believe that but i'll give it a try. I already tested a branch from sage where he fixed a race regarding max active some weeks ago. So active recovering was max 1 but the issue didn't went away.
>>
>> Stefan
>>
>>> -Sam
>>>
>>> On Mon, Aug 12, 2013 at 10:34 PM, Samuel Just <sam.just@xxxxxxxxxxx> wrote:
>>>> I got swamped today.  I should be able to look tomorrow.  Sorry!
>>>> -Sam
>>>>
>>>> On Mon, Aug 12, 2013 at 9:39 PM, Stefan Priebe - Profihost AG
>>>> <s.priebe@xxxxxxxxxxxx> wrote:
>>>>> Did you take a look?
>>>>>
>>>>> Stefan
>>>>>
>>>>> Am 11.08.2013 um 05:50 schrieb Samuel Just <sam.just@xxxxxxxxxxx>:
>>>>>
>>>>>> Great!  I'll take a look on Monday.
>>>>>> -Sam
>>>>>>
>>>>>> On Sat, Aug 10, 2013 at 12:08 PM, Stefan Priebe <s.priebe@xxxxxxxxxxxx> wrote:
>>>>>>> Hi Samual,
>>>>>>>
>>>>>>> Am 09.08.2013 23:44, schrieb Samuel Just:
>>>>>>>
>>>>>>>> I think Stefan's problem is probably distinct from Mike's.
>>>>>>>>
>>>>>>>> Stefan: Can you reproduce the problem with
>>>>>>>>
>>>>>>>> debug osd = 20
>>>>>>>> debug filestore = 20
>>>>>>>> debug ms = 1
>>>>>>>> debug optracker = 20
>>>>>>>>
>>>>>>>> on a few osds (including the restarted osd), and upload those osd logs
>>>>>>>> along with the ceph.log from before killing the osd until after the
>>>>>>>> cluster becomes clean again?
>>>>>>>
>>>>>>>
>>>>>>> done - you'll find the logs at cephdrop folder:
>>>>>>> slow_requests_recovering_cuttlefish
>>>>>>>
>>>>>>> osd.52 was the one recovering
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Greets,
>>>>>>> Stefan
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html