On Fri, Nov 18, 2016 at 1:04 PM, Iain Buclaw <ibuclaw@xxxxxxxxx> wrote: > On 18 November 2016 at 13:14, John Spray <jspray@xxxxxxxxxx> wrote: >> On Fri, Nov 18, 2016 at 11:53 AM, Iain Buclaw <ibuclaw@xxxxxxxxx> wrote: >>> Hi, >>> >>> Follow up from the suggestion to use any of the following options: >>> >>> - client_mount_timeout >>> - rados_mon_op_timeout >>> - rados_osd_op_timeout >>> >>> To mitigate the waiting time being blocked on requests. Is there >>> really no other way around this? >>> >>> If two OSDs go down that between them have the both copies of an >>> object, it would be nice to have clients fail *immediately*. I've >>> tried reducing the rados_osd_op_timeout setting to 0.5, but when >>> things go wrong, it still results in the collapse of the cluster and >>> all reads from it. >> >> Can you be more specific about what is happening when you set >> rados_osd_op_timeout? You're not seeing timeouts at all, operations >> are blocking instead? >> > > Certainly, they are timing out, but the problem is a numbers game. > > Let's say there are 8 client workers, and between them they are > handling 250 requests per second. A DR situation happens and two OSDs > go down taking 60 PGs with it belonging to a pool with 1024 PGs. Now > you have a situation where 1 in every (1024 / 60) requests to ceph > will timeout. Eventually ending up with a situation where all clients > are blocked waiting for either a response from the OSD or ETIMEOUT. > >> If you can provide a short librados program that demonstrates an op >> blocking indefinitely even when a timeout is set, that would be >> useful. >> > > It's not blocking indefinitely, but the fact that it's blocking at all > is a concern. If a PG is down, no use waiting for it to come back up. > Just give up on the read operation and notify the client immediately, > rather than blocking the client from doing anything else. OK, so you want a new behaviour where it cancels your requests when OSDs go down, as opposed to timing out. Clearly that's not what the current code does: you would have to modify Ceph yourself to do this. Look at Objecter::_scan_requests to see how it currently responds to osdmap updates that affect requests in flight -- it scans through them to identify which ones need resending to a different OSD, you would add an extra behaviour to identify requests that weren't currently serviceable, and cancel them. John > To clarify another position, it makes no sense to use the AIO in my > case. The clinets in question are nginx worker threads, and they > manage async processing between them. Where async doesn't happen is > when the thread is stuck inside a stat() or read() call into librados. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com