Re: [PATCH v5 00/44] ldisc patchset

Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> · Tue, 12 Mar 2013 12:47:26 -0400

On Mon, 2013-03-11 at 19:28 -0700, Michel Lespinasse wrote:
> On Mon, Mar 11, 2013 at 1:44 PM, Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> wrote:
> > Greg,
> > This patchset includes
> >   'tty: Drop lock contention stat from ldsem trylocks'
> > so no need to apply that on this series. Also, I noticed you
> > kept the 'tty is NULL' removal on a different branch so I left
> > my patch in this series that removes it.
> >
> > This series applies cleanly to tty-next.
> >
> > v5 changes:
> >
> > After completing an audit of the recursive use of ldisc
> > references, I discovered the _blocking_ recursive acquisition
> > of ldisc references was limited to line disciplines misusing
> > the tty_perform_flush() function.
> > With that now resolved in,
> >   'tty: Fix recursive deadlock in tty_perform_flush()'
> > the recursion design in ldsem has been removed.
> >
> > The recursion removal is in its own patch,
> >   'tty: Remove ldsem recursion support'
> > to ease review for those that have already reviewed the
> > ldsem implementation.
> >
> > In addition, this patchset implements lock stealing derived
> > from the work of Michel Lespinasse <walken@xxxxxxxxxx> on
> > writer lock stealing in rwsem.
> >
> > Although the rwsem write lock stealing changes are motivated
> > by performance criteria, these changes are motivated by reduced
> > code line count and simplicity of design.
> >
> > *** Edited below to remove recursion discussion ***
> >
> > Back in early December I realized that a classic read/write semaphore
> > with writer priority was the ideal mechanism for handling the
> > line discipline referencing.
> >
> > Line discipline references act as "readers"; closing or changing the
> > line discipline is prevented while these references are outstanding.
> > Conversely, line discipline references should not be granted while
> > the line discipline is closing or changing; these are the "writers".
> >
> > Unfortunately, the existing rwsem uses a FIFO priority for
> > waiting threads and does not support timeouts.
> >
> > So this implements just that: a writer-priority
> > read/write semaphore with timed waits.
> 
> Thanks for eliminating the recursion requirement. I think this really
> helps - I didn't like that multiple readers with a colliding current
> hash could basically starve out a writer forever.

Yeah, writer starvation was a definite down-side. It looked like the
only option at the time.

> Not knowing anything about the tty layer, I am curious about the
> context for your other requirements. What are ldisc references taken
> for and for how long are they held ? I am surprised that the writers
> may hit a 5 second timeout (because I didn't expect the references to
> be held for very long).

Some background:

A line discipline (ldisc) abstracts raw device-side i/o for the tty
layer. The line discipline is responsible for i/o translation,
buffering, and flow control. Line disciplines can also provide other
features, such as i/o routing; eg., N_PPP routes serial i/o out to the
network stack, instead of providing user-space file i/o.

Line disciplines are packaged as loadable drivers. Each tty device has
its own 'instance' of a line discipline. In addition, a tty's line
discipline can be changed via the TIOCSETD ioctl.

The N_TTY line discipline is the default ldisc and implements the
standard terminal control features such as ^C -> SIGINT, CR/NL mangling,
echoing and line-edit mode.

While the tty layer manages the state of the tty device and provides the
user-space file i/o interface, the line discipline actually implements
the file i/o. This means that user-space blocking reads and blocking
polls block in the respective line discipline i/o methods.

Ldisc references:

Line discipline (ldisc) references pin the tty device's ldisc instance
while in use. This can be short -- eg., when device-side input is
received -- or can be really long -- eg., getty uses blocking read to
park the virtual terminals.

Two requirements complicate this otherwise straightforward
reference-counting pattern.

First, the TIOCSETD ioctl must change the line discipline atomically
wrt. user-space file i/o _and_ device-side i/o. Of course, this can only
happen if there are no existing references to this ldisc. Since ldisc
references have highly variable lifetimes, changing the line discipline
may fail if all existing references to this ldisc are not released. Thus
the lock timeout requirement.

Second, ttys can be hung up, synchronously (like logout and usb
disconnect) and asynchronously (like carrier loss). A tty hangup
recycles the ldisc so at the conclusion of the tty hangup, a fresh reset
ldisc is ready for use. This idiosyncrasy is both a security measure,
and also a requirement to support legacy logins which don't properly
reset the terminal state.

>From the point at which a hangup occurs, all current and future i/o must
cease. Foreground processes which may have existing ldisc references are
signalled so they can exit the ldisc i/o loop and drop their references.

> Also why the write-priority requirement rather than reader-writer
> fairness ? Is it to make it less likely to hit the writer timeouts ?

Since tty i/o can be really [painfully] slow, allowing waiting future
references to succeed is not an option.

> In short: I am worried about the introduciton of a new lock type, and
> would be happier if rwsem could be made to fit. BTW, extending rwsem
> itself to add writer timeouts seems quite doable (but making it work
> as a write priority lock would seem like a bad idea).

I understand the concern regarding the potential proliferation of new
lock types. Lock implementations are hard to get right, and no one wants
to debug 7 different lock policy implementations of a read/write
semaphore.

OTOH, a lack of existing options has spawned a DIY approach without
higher-order locks that is rarely correct, but which goes largely
unnoticed exactly because it's not a new lock. A brief review of the
hangs, races, and deadlocks fixed by this patchset should be convincing
enough of that fact. In my opinion, this is the overriding concern.

The two main problems with a one-size-fits-all lock policy is that,
1) lock experts can't realistically foresee the consequences of policy
changes without already being experts in the subsystems in which that
lock is used. Even domain experts may miss potential consequences, and
2) domain experts typically wouldn't even consider writing a new lock.
So they make do with atomic bit states, spinlocks, reference counts,
mutexes, and waitqueues, making a mostly-functional, higher-order lock. 

I won't make a case for 2) above, as I think that's self-evident.

As an example of 1) above, Alex's and your work on write lock stealing
is predicated on the proposition that writers advancing in front of
waiting readers is acceptable; that may or may not be true for every use
case.  [ It's not my intention to be critical here -- obviously I
appreciate your work because I used it in ldsem. :) ]

Perhaps a future direction for rwsem would be to provide a selectable
lock policy (fifo, mostly-fair, writer-first) on initialization so that
the different use cases can be easily accomodated?

Regards,
Peter Hurley

--
To unsubscribe from this list: send the line "unsubscribe linux-serial" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html