Re: [PATCH 0/4] treewide: fix interrupted release

Daniel Vetter <daniel@xxxxxxxx> · Tue, 15 Oct 2019 16:07:26 +0200

On Mon, Oct 14, 2019 at 06:13:26PM +0200, Johan Hovold wrote:
> On Mon, Oct 14, 2019 at 10:48:47AM +0200, Daniel Vetter wrote:
> > On Fri, Oct 11, 2019 at 11:36:33AM +0200, Johan Hovold wrote:
> > > On Thu, Oct 10, 2019 at 03:50:43PM +0200, Daniel Vetter wrote:
> > > > On Thu, Oct 10, 2019 at 03:13:29PM +0200, Johan Hovold wrote:
> > > > > Two old USB drivers had a bug in them which could lead to memory leaks
> > > > > if an interrupted process raced with a disconnect event.
> > > > > 
> > > > > Turns out we had a few more driver in other subsystems with the same
> > > > > kind of bug in them.
> > > 
> > > > Random funny idea: Could we do some debug annotations (akin to
> > > > might_sleep) that splats when you might_sleep_interruptible somewhere
> > > > where interruptible sleeps are generally a bad idea? Like in
> > > > fops->release?
> > > 
> > > There's nothing wrong with interruptible sleep in fops->release per se,
> > > it's just that drivers cannot return -ERESTARTSYS and friends and expect
> > > to be called again later.
> > 
> > Do you have a legit usecase for interruptible sleeps in fops->release?
> 
> The tty layer depends on this for example when waiting for buffered
> writes to complete (something which may never happen when using flow
> control).
> 
> > I'm not even sure killable is legit in there, since it's an fd, not a
> > process context ...
> 
> It will be run in process context in many cases, and for ttys we're good
> AFAICT.

Huh, read it a bit, all the ->shutdown callbacks have void return type.
But there's indeed interruptible sleeps in there. Doesn't this break
userspace that expects that a close() actually flushes the tty?

Imo if you're ->release callbacks feels like it should do a wait to
guaranteed something userspace expects, then doing a
wait_interruptible/killable feels like a bug. Or alternatively, the wait
isn't really needed in the first place.

> > > The return value from release() is ignored by vfs, and adding a splat in
> > > __fput() to catch these buggy drivers might be overkill.
> > 
> > Ime once you have a handful of instances of a broken pattern, creating a
> > check for it (under a debug option only ofc) is very much justified.
> > Otherwise they just come back to life like the undead, all the time. And
> > there's a _lot_ of fops->release callbacks in the kernel.
> 
> Yeah, you have a point.
> 
> But take tty again as an example, the close tty operation called from
> release() is declared void so there's no propagated return value for vfs
> to check.
> 
> It may even be better to fix up the 100 or so callbacks potentially
> returning non-zero and make fops->release void so that the compiler
> would help us catch any future bugs and also serve as a hint for
> developers that returning errnos from fops->release is probably not
> what you want to do.
> 
> But that's a lot of churn of course.

Hm indeed ->release has int as return type. I guess that's needed for
file I/O errno and similar stuff ...

Still void return value doesn't catch funny stuff like doing interruptible
waits and occasionally failing if you have a process that likes to use
signals and also uses some library somewhere to do something. In graphics
we have that, with Xorg loving signals for various things.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch