On Sat, 2017-02-11 at 23:30 +0300, Artur Molchanov wrote: > Hi Jef, > > On Fri, 2017-02-10 at 14:31, Jeff Layton wrote: > > On Thu, 2017-02-09 at 16:04 +0300, Artur Molchanov wrote: > > > From: Artur Molchanov <artur.molchanov@xxxxxxxxxx> > > > > > > Complete stuck requests to OSD with error EIO after osd_request_timeout expired. > > > If osd_request_timeout equals to 0 (default value) then do nothing with > > > hung requests (keep default behavior). > > > > > > Create RBD map option osd_request_timeout to set timeout in seconds. Set > > > osd_request_timeout to 0 by default. > > > > > > > Also, what exactly are the requests blocked on when this occurs? Is the > > ceph_osd_request_target ending up paused? I wonder if we might be better > > off with something that returns a hard error under the circumstances > > where you're hanging, rather than depending on timeouts. > > I wonder that it is better to complete requests only after timeout expired, just > because a request can fail due to temporary network issues (e.g. router > restarted) or restarting machine/services. > > > Having a job that has to wake up every second or so isn't ideal. Perhaps > > you would be better off scheduling the delayed work in the request > > submission codepath, and only rearm it when the tree isn't empty after > > calling complete_osd_stuck_requests? > > Would you please tell me more about rearming work only if the tree is not empty > after calling complete_osd_stuck_requests? From what code we should call > complete_osd_stuck_requests? > Sure. I'm saying you would want to call schedule_delayed_work for the timeout handler from the request submission path when you link a request into the tree that has a timeout. Maybe in __submit_request? Then, instead of unconditionally calling schedule_delayed_work at the end of handle_request_timeout, you'd only call it if there were no requests still sitting in the osdc trees. > As I understood, there are two primary cases: > 1 - Requests to OSD failed, but monitors do not return new osdmap (because all > monitors are offline or monitors did not update osdmap yet). > In this case requests are retried by cyclic calling ceph_con_workfn -> con_fault > -> ceph_con_workfn. We can check request timestamp and does not call con_fault > but complete it. > > 2 - Monitors return new osdmap which does not have any OSD for RBD. > In this case all requests to the last ready OSD will be linked on "homeless" OSD > and will not be retried until new osdmap with appropriate OSD received. I think > that we need additional periodic checking timestamp such requests. > > Yes, there is already existing job handle_timeout. But the responsibility of > this job is to sending keepalive requests to slow OSD. I'm not sure that it is a > good idea to perform additional actions inside this job. > I decided that creating specific job handle_osd_request_timeout is more applicable. > > This job will be run only once with a default value of osd_request_timeout (0). Ahh, I missed that -- thanks. > At the same time, I think that user will not use too small value for this > parameter. I wonder that typical value will be about 1 minute or greater. > > > Also, I don't see where this job is ever cancelled when the osdc is torn > > down. That needs to occur or you'll cause a use-after-free oops... > > It is my fault, thanks for the correction. > -- Jeff Layton <jlayton@xxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html