Greg, As before, this patchset is dependent on the 'ldsem patchset'. The reason is that this series abandons tty->receive_room as a flow control mechanism (because that requires locking), and the TIOCSETD ioctl _without ldsem_ uses tty->receive_room to shutoff i/o. It is also dependent on 'n_tty fixes' which I recently resent to you. Regards, Peter Hurley ******* This patchset is the 1st of 4 patchsets which implements an almost entirely lockless receive path from driver to user-space. Non-rigorous performance measurements show a 9~15x speed improvement on SMP in end-to-end copying with all 4 patchsets applied. ** v4 changes ** - Rebased on tty.git/tty-next of 14 Jun ** v3 changes ** - Instead of a new receive_room() ldisc method which requires acquiring the termios_rwsem twice for every flip buffer received, this patchset version adds an alternate receive_buf2() ldisc method for use with flow-controlled line disciplines (like N_TTY). This also fixes a race when termios can be changed between computing the receive space available and the subsequent receive_buf(). - Converts vt paste_selection() to use a helper function for this new ldisc method. - Protects the n_tty_write() path from termios changes. - Optimizes the N_TTY throttle/unthrottle by only offering termios read-safety to the driver throttle()/unthrottle() methods. - Special-casing pty throttle/unthrottle to avoid multiple atomic operations for every read. ** v2 changes ** - Rebased on top of 'tty: Fix race condition if flushing tty flip buffers' - I forgot to mention; this is ~35% faster on end-to-end tests on SMP. This patchset implements lockless receive from tty flip buffers to the n_tty read buffer and lockless copy into the user-space read buffer. By lockless, I'm referring to the fine-grained read_lock formerly used to serialize access to the shared n_tty read buffer (which wasn't being used everywhere it should have been). In the current n_tty, the read_lock is grabbed a minimum of 3 times per byte! - ^^^^ - should say 2 times per byte! The read_lock is unnecessary to serialize access between the flip buffer work and the single reader, as this is a single-producer/single-consumer pattern. However, other threads may attempt to read or modify the buffer indices, notably for buffer flushing and for setting/resetting termios (there are some others). In addition, termios changes can cause havoc while the tty flip buffer work is pushing more data. Read more about that here: https://lkml.org/lkml/2013/2/22/480 Both hurdles are overcome with the same mechanism: converting the termios_mutex to a r/w semaphore (just a normal one :). Both the receive_buf() path and the read() path claim a reader lock on the termios_rwsem. This prevents concurrent changes to termios. Also, flush_buffer() and TIOCINQ ioctl obtain a write lock on the termios_rwsem to exclude the flip buffer work and user-space read from accessing the buffer indices while resetting them. This patchset also implements a block copy from the read_buf into the user-space buffer in canonical mode (rather than the current byte-by-byte method). Peter Hurley (24): tty: Don't change receive_room for ioctl(TIOCSETD) tty: Simplify tty buffer/ldisc interface with helper function tty: Make ldisc input flow control concurrency-friendly n_tty: Factor canonical mode copy from n_tty_read() n_tty: Line copy to user buffer in canonical mode n_tty: Split n_tty_chars_in_buffer() for reader-only interface tty: Deprecate ldisc .chars_in_buffer() method n_tty: Get read_cnt through accessor n_tty: Don't wrap input buffer indices at buffer size n_tty: Remove read_cnt tty: Convert termios_mutex to termios_rwsem n_tty: Access termios values safely n_tty: Replace canon_data with index comparison n_tty: Make N_TTY ldisc receive path lockless n_tty: Reset lnext if canonical mode changes n_tty: Fix type mismatches in receive_buf raw copy n_tty: Don't wait for buffer work in read() loop n_tty: Separate buffer indices to prevent cache-line sharing tty: Only guarantee termios read safety for throttle/unthrottle n_tty: Move chars_in_buffer() to factor throttle/unthrottle n_tty: Factor throttle/unthrottle into helper functions n_tty: Move n_tty_write_wakeup() to avoid forward declaration n_tty: Special case pty flow control n_tty: Queue buffer work on any available cpu drivers/net/irda/irtty-sir.c | 8 +- drivers/tty/n_tty.c | 662 ++++++++++++++++++++++++++----------------- drivers/tty/pty.c | 4 +- drivers/tty/tty_buffer.c | 34 ++- drivers/tty/tty_io.c | 15 +- drivers/tty/tty_ioctl.c | 90 +++--- drivers/tty/tty_ldisc.c | 13 +- drivers/tty/vt/selection.c | 4 +- drivers/tty/vt/vt.c | 4 +- include/linux/tty.h | 21 +- include/linux/tty_ldisc.h | 13 + 11 files changed, 530 insertions(+), 338 deletions(-) -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-serial" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html