Re: [PATCH] libceph: Complete stuck requests to OSD with EIO

Ilya Dryomov <idryomov@xxxxxxxxx> · Mon, 13 Feb 2017 16:32:47 +0100

On Mon, Feb 13, 2017 at 4:23 PM, Artur Molchanov
<artur.molchanov@xxxxxxxxxx> wrote:
> On 02/13/2017 05:15 PM, Ilya Dryomov wrote:
>>
>> On Mon, Feb 13, 2017 at 10:54 AM, Artur Molchanov
>> <artur.molchanov@xxxxxxxxxx> wrote:
>>>
>>> Hi Ilya,
>>>
>>> On 02/13/2017 12:11 PM, Ilya Dryomov wrote:
>>>>
>>>>
>>>> Hi Artur,
>>>>
>>>> How about the attached patch?  handle_timeout() is going to iterate
>>>> over all but the homeless OSD anyway; all it costs us is a couple of
>>>> tests, so I don't think a separate work is needed.
>>>>
>>>> abort_request() is a simple wrapper around complete_request(), making
>>>> it safe to call at any time -- replace it with complete_request() for
>>>> now if you want to try this out.
>>>
>>>
>>>
>>> Using one job for sending keepalive requests and completing stuck
>>> requests
>>> brings us to the need to check that osd_keepalive_timeout is not larger
>>> then
>>> osd_request_timeout. So we should not forget to say about it in the
>>> documentation.
>>
>>
>> I'll make a note to mention that it is osd_keepalive_timeout-precise.
>>
>>> Is it worth creating a correlation between osd_keepalive_timeout and
>>> osd_request_timeout?
>>
>>
>> No, probably not.  osd_keepalive_timeout and osd_idle_ttl are similarly
>> related and we don't have any.
>
>
> Your variant of patch works.
> As I said, using job handle_timeout to send keepalive requests and aborting
> stuck requests is not the perfect choice, but OK, it works.
>
> What should I do to make this patch merged to upstream?

Nothing -- I'll apply it for 4.11 once we have abort_request().

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html