On Fri, Aug 1, 2014 at 5:27 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > On Fri, Aug 01, 2014 at 04:56:27PM +0400, Ilya Dryomov wrote: >> I'm going to fix up rbd_request_fn(), but I want to make sure >> I understand this in full. >> >> - Previously the danger of calling blocking primitives on the way to >> schedule(), i.e. with task->state != TASK_RUNNING, was that if the >> blocking primitive was indeed to block the task state would be set >> back to TASK_RUNNING and the schedule() that that task was on the way >> to wouldn't have any effect. Your "Add extra reschedule point" patch >> essentially made calling mutex_lock() and probably others much more >> wrong that it used to be, because mutex_lock() may now reschedule >> when the task is not on the mutex wait queue. > > Right and in general we cannot allow spurious wakeups (although we try > very hard to deal with them in generic code, which is why things more or > less worked). > > But if you do a patch that 'randomly' ignores ->state on schedule (I did > one) stuff comes apart _real_ quick. > > Therefore you should very much not destroy ->state on the way to > schedule. > >> - There is nothing wrong with releasing queue_lock and reenabling IRQs >> in rbd_request_fn() as long as it doesn't block and I remember to >> disable IRQs and take queue_lock back on return. > > Releasing queue_lock might be ok, dunno about the blk locking, however > reenabling IRQs it is actually wrong as per blk_flush_plug_list() since > that uses local_irq_save()/restore() which means it can be called from > contexts which cannot deal with enabling IRQs, and then your > ->request_fn() goes and does that. > > Now maybe blk_flush_plug_list() is overly paranoid and it could use > local_irq_disable()/enable() instead, I don't know. But until it does, a > request_fn() should never reenable IRQs. > >> I'm asking because rbd_request_fn() is probably not the only broken in >> this way code path. I poked around and found read_events() in aio.c, >> it seems to have been written with the "danger" assumption that >> I outlined above and there is even a comment to it. > > I'm fairly sure there's more broken stuff, I didn't dare looking. > >> Does that above make sense or am I missing something? > > I think that's about it. Thanks for clarifying things. CC'ing Kent to draw attention to read_events(). Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html