On Tue, 13 Oct 2009, Paul Fulghum wrote: > > This is correct, the last buffer is not passed to tty_buffer_free() > if it is the last in the list so tail is maintained. > There is no free space in it so no new data can be added. > There is no place where tail is null while the spinlock > is released in preparation for calling receive_buf. > I still can't spot any flaw in the current locking. Do you even bother reading my emails? Let me walk through an example of where the locking F*CKS UP, exactly because it's broken. thread1 thread2 thread3 flush_to_ldisc set_bit(TTY_FLUSHING) buf.head = NULL ... ..release lock.. .. sleep in ->receive_buf .. flush_to_ldisc set_bit(TTY_FLUSHING) .. head==NULL .. clear_bit(TTY_FLUSHING) .. release lock .. tty_ldisc_flush() -> tty_buffer_flush() TTY_FLUSHING not set! -> __tty_buffer_flush() -> tty->buf.tail = NULL and now you're screwed. See? You have both 'buf.tail' and 'buf.head' both being NULL, and look what happens in that case 'tty_buffer_request_room()' if some new data comes in? Right: it will add the buffer to both tail and head. And notice how 'thread1' is still inside flush_to_ldisc()! The buffer that got added will be overwritten by the old one, and now tail and head no longer match. Or another flush_to_ldisc() comes in, and now it won't be a no-op any more, and it will find the new data, and run ->receive_buf concurrently with the old receive_buf from thread1. And the whole reason was that there were some very odd locking rules: buf.head=NULL meant "don't flush", and "TTY_FLUSHING is set" meant "don't clear 'buf.head'", and but the "don't flush" case still cleared TTY_FLUSHING (after not flushing), and it all messed up. I could just have fixed it (move the "clear_bit(TTY_FLUSHING)" but up, but the fact is, once you fix that, it then becomes obvious that "buf.head=NULL" really is the wrong thing to test in the first place, and we should just use TTY_FLUSHING instead, and simply _remove_ the odd "buf.head=NULL is special" case. Which is what my patch did > Your statement that the locking is too clever/subtle is > clearly true since I am struggling to work this out again. I have to say that the only case I could make up that is _clearly_ a bug is the above very contrieved example. I don't really think something like the above happens in reality. But it's an example of bad locking, and what happens when the locking logic isn't obvious. There may be other cases where the locking fails, and I just didn't find them. Or the patch may simply not fix anything in practice, and nobody has ever actually triggered the bad locking in real life. I dunno. I just do know that the locking was too damn subtle. Any time people do ad-hoc locking with "clever" schemes, it's almost invariably buggy. So the rule is: just don't do that. Make the locking rules "obvious". Don't have subtle rules about "if head is NULL, then we're not going to add any new buffers to it, except if tail is also NULL". Because look above what happens, and see how complicated it was to even see the bug. Linus -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html