On Thu, 20 Mar 2014 15:28:33 -0400 Jeffrey Layton <jlayton@xxxxxxxxxx> wrote: > Nice analysis! I think eventually we'll need to overhaul this code not Note, Ulrich Obergfell helped a bit in the initial analysis. He found from a customer core dump that the kworker thread was blocked on the cinode->lock_sem, and the reader was blocked as well. That was enough for me to find where the problem laid. > to use rw semaphores, but that's going to take some redesign. (Wonder > if we could change it to use seqlocks or something?) > > Out of curiousity, does this eventually time out and unwedge itself? > Usually when the server doesn't get a response to an oplock break in > around a minute or so it gives up and allows the thing that caused the > oplock break to proceed anyway. Not great for performance but it out to > eventually make progress due to that. No, I believe it's hard locked. Nothing is going to wake up the oplock break if it is blocked on a down_read(). Only the release of the rwsem will do that. It's the subtle way the kworker threads are done. > > In any case, this looks like a reasonable fix for now, but I suspect you > can hit similar problems in the write codepath too. What may be best is > turn this around and queue the oplock break to the new workqueue > instead of the read completion job. Or perhaps give both the read and write their own workqueues? We have to look at all the work queue handlers, and be careful about any users that take the lock_sem, and separate them out. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-cifs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html