Re: [PATCH] tty: make sure a BUG is hit if tty_port will be destroyed before tty

Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> · Fri, 17 May 2013 14:06:05 -0400

On 05/17/2013 12:41 PM, Alexander Holler wrote:
Am 17.05.2013 17:31, schrieb Greg Kroah-Hartman:
On Fri, May 17, 2013 at 09:12:08AM +0200, Alexander Holler wrote:
tty depends on tty_port until tty_release() was called. Make sure a BUG
will be hit, if tty_port will be destroyed before tty.

So you want to ensure that we crash a machine?  No, please never add

Exactly. Let me quote myself:

 >> As described before, it ends up with memory corruption because freed
 >> memory is used, so if a BUG() happens, it doesn't help much. E.g. with
 >> kernel 3.9.2 I never have seen a bug, just a rebooting machine
 >> (sometimes minutes after the real bug happened).

BUG() statements to the kernel, unless something _really_ bad is going
to happen if we don't call it.  I never want to stop a machine from
running, do you?

Yes. I'm not sure how you define _really_ bad, but a memory corruption with undefined result is exactly how I would define such.

First, I like the idea of a diagnostic here. But I'm with Greg on
this; BUG() is overkill. Just because the specific path which you found
only kills the process doesn't mean that other callers might not
prompt machine halt.

The memory corruption happens as a result of the tty_port being freed
by tty_port_destructor(). So a suitable diagnostic is to detect the
condition, WARN, and return without actually performing the destroy, yes?

And in the case of rfcomm, the box doesn't stop, at least not here. Just the process is killed together with an easy to identfiy oops. And the BUG_ON() prevents that memory will become corrupted and the machine is still usable afterwards. If that isn't a use case for BUG_ON(), I really don't know what else would be a use case for it.

I can't take this as-is, why not just fix the root problem?

First I'm still not sure about the root problem and awaiting some response to my mail before that patch. As noted in the mail with the patch, 3.10-rc1 looks different, so the it might already be fixed there, even if rfcomm doesn't handle the tty as it (now in 3.8 and 3.9) should be (I haven't tested 3.10-rc1 up to now).

This problem exists from the commit identified up through current
mainline. It has not been fixed yet. I'll specifically respond to what
I believe is the correct solution in that thread.

Second, if I would fix the bug in rfcomm, as Peter suggested, I still would not know if the same problem doesn't appear in any other user of ttys too, so even if I would fix rfcomm, I still would want that BUG_ON() to make sure I don't get a memory corruption whenever another similiar bug is hit.

Only a handful of tty drivers use the ref-counted destroy method of
tty_port cleanup (the other method being calling tty_port_destroy()
directly). They are:

  drivers/tty/pty.c
  drivers/mmc/card/sdio_uart.c
  drivers/idsn/capi/capi.c
  drivers/usb/class/cdc-acm.c
  drivers/tty/n_gsm.c
  drivers/tty/hvc/hvc_console.c
  drivers/tty/hvc/hvcs.c
  net/bluetooth/rfcomm/tty.c

Of these, I know and have reviewed the first two drivers; their
usage is correct.

While reviewing these additional drivers, maybe we should review
whether it makes sense to
  * require and document the tty_port to live at least
    to ops->cleanup() (currently the default)
  * allow tty_port lifetime to be completely independent of
    a tty's lifetime
  * remove ref-counting from tty_port

Regards,
Peter Hurley

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html