Re: [PATCH RFC] fs/aio: fix sleeping while TASK_INTERRUPTIBLE

Chris Mason <clm@xxxxxx> · Mon, 29 Dec 2014 10:08:14 -0500

On Wed, Dec 24, 2014 at 9:56 PM, Kent Overstreet <kmo@xxxxxxxxxxxxx> 
wrote:
On Mon, Dec 22, 2014 at 07:16:25PM -0500, Chris Mason wrote:
 The 3.19 merge window brought in a great new warning to catch 
someone
 calling might_sleep with their state != TASK_RUNNING.  The idea was 
to
 find buggy code locking mutexes after calling prepare_to_wait(), 
kind
 of like this:

Ben just told me about this issue.

IMO, the way the code is structured now is correct, I would argue the 
problem is
with the way wait_event() works - they way they have to mess with the 
global-ish
task state when adding a wait_queue_t to a wait_queue_head (who came 
up with
these names?)

Grin, probably related to the guy who made closure_wait() not actually 
wait.

The advantage to the waitqueue head _t setup is its a very well 
understood mechanism for sleeping on something without missing wakeups. 
The locking overhead for the waitqueues can be a problem for lots of 
waiters on the same queue, but otherwise the overhead is low.

I think closures are too big a hammer for this problem, unless 
benchmarks show we need the lockless lists (I really like that part).  
I do hesitate to make big changes here because debugging AIO hangs is 
horrible.  The code is only tested by a few workloads, and we can go a 
long time before problems are noticed.  When people do hit bugs, we 
only notice the ones where applications pile up in getevents.  
Otherwise it's just strange performance changes that we can't explain 
because they are hidden in the app's AIO state machine.

When I first looked at the warning, I didn't realize that might_sleep 
and friends were setting a preempted flag to make sure the task wasn't 
removed from the runqueue.  So I thought we'd potentially sleep forever 
(thanks Peter for details++).  The real risk here is burning CPU in the 
running state, potentially a lot of it if the mutex is highly 
contended. We've probably been hitting this for a while, but since we 
test AIO performance with fast storage, the burning just made us look 
faster.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html