Re: [PATCH] libceph: Complete stuck requests to OSD with EIO

Artur Molchanov <artur.molchanov@xxxxxxxxxx> · Mon, 13 Feb 2017 18:23:24 +0300

On 02/13/2017 05:15 PM, Ilya Dryomov wrote:
On Mon, Feb 13, 2017 at 10:54 AM, Artur Molchanov
<artur.molchanov@xxxxxxxxxx> wrote:
Hi Ilya,

On 02/13/2017 12:11 PM, Ilya Dryomov wrote:

Hi Artur,

How about the attached patch?  handle_timeout() is going to iterate
over all but the homeless OSD anyway; all it costs us is a couple of
tests, so I don't think a separate work is needed.

abort_request() is a simple wrapper around complete_request(), making
it safe to call at any time -- replace it with complete_request() for
now if you want to try this out.

Using one job for sending keepalive requests and completing stuck requests
brings us to the need to check that osd_keepalive_timeout is not larger then
osd_request_timeout. So we should not forget to say about it in the
documentation.

I'll make a note to mention that it is osd_keepalive_timeout-precise.

Is it worth creating a correlation between osd_keepalive_timeout and
osd_request_timeout?

No, probably not.  osd_keepalive_timeout and osd_idle_ttl are similarly
related and we don't have any.

Your variant of patch works.
As I said, using job handle_timeout to send keepalive requests and aborting 
stuck requests is not the perfect choice, but OK, it works.

What should I do to make this patch merged to upstream?

--
Artur

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html