Re: Down OSDs blocking read requests.

John Spray <jspray@xxxxxxxxxx> · Fri, 18 Nov 2016 13:15:28 +0000

On Fri, Nov 18, 2016 at 1:04 PM, Iain Buclaw <ibuclaw@xxxxxxxxx> wrote:
> On 18 November 2016 at 13:14, John Spray <jspray@xxxxxxxxxx> wrote:
>> On Fri, Nov 18, 2016 at 11:53 AM, Iain Buclaw <ibuclaw@xxxxxxxxx> wrote:
>>> Hi,
>>>
>>> Follow up from the suggestion to use any of the following options:
>>>
>>> - client_mount_timeout
>>> - rados_mon_op_timeout
>>> - rados_osd_op_timeout
>>>
>>> To mitigate the waiting time being blocked on requests.  Is there
>>> really no other way around this?
>>>
>>> If two OSDs go down that between them have the both copies of an
>>> object, it would be nice to have clients fail *immediately*.  I've
>>> tried reducing the rados_osd_op_timeout setting to 0.5, but when
>>> things go wrong, it still results in the collapse of the cluster and
>>> all reads from it.
>>
>> Can you be more specific about what is happening when you set
>> rados_osd_op_timeout?  You're not seeing timeouts at all, operations
>> are blocking instead?
>>
>
> Certainly, they are timing out, but the problem is a numbers game.
>
> Let's say there are 8 client workers, and between them they are
> handling 250 requests per second.  A DR situation happens and two OSDs
> go down taking 60 PGs with it belonging to a pool with 1024 PGs.  Now
> you have a situation where 1 in every (1024 / 60) requests to ceph
> will timeout.  Eventually ending up with a situation where all clients
> are blocked waiting for either a response from the OSD or ETIMEOUT.
>
>> If you can provide a short librados program that demonstrates an op
>> blocking indefinitely even when a timeout is set, that would be
>> useful.
>>
>
> It's not blocking indefinitely, but the fact that it's blocking at all
> is a concern.  If a PG is down, no use waiting for it to come back up.
> Just give up on the read operation and notify the client immediately,
> rather than blocking the client from doing anything else.

OK, so you want a new behaviour where it cancels your requests when
OSDs go down, as opposed to timing out.  Clearly that's not what the
current code does: you would have to modify Ceph yourself to do this.
Look at Objecter::_scan_requests to see how it currently responds to
osdmap updates that affect requests in flight -- it scans through them
to identify which ones need resending to a different OSD, you would
add an extra behaviour to identify requests that weren't currently
serviceable, and cancel them.

John

> To clarify another position, it makes no sense to use the AIO in my
> case.  The clinets in question are nginx worker threads, and they
> manage async processing between them.  Where async doesn't happen is
> when the thread is stuck inside a stat() or read() call into librados.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com