Re: [PATCH 3/5] fs: remove ki_nbytes

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> · Wed, 4 Feb 2015 13:17:30 -0500 (EST)

On Wed, 4 Feb 2015, Al Viro wrote:

> [USB folks Cc'd]

Incidentally, Al, have you seen this email?

	http://marc.info/?l=linux-usb&m=142295011402339&w=2

I encouraged the writer to send in a patch but so far there has been no 
reply.

> On Mon, Feb 02, 2015 at 03:26:17PM +0100, Christoph Hellwig wrote:
> 
> > I would bet the behavior difference is a bug, might be worth to Cc the
> > usb folks on this issue.  I bet we'd want the more complex behavior
> > for both variants.
> 
> [Context for USB people: The difference in question is what ep_read() does
> when it is called on write endpoint that isn't isochronous;

You're talking about drivers/usb/gadget/legacy/inode.c, right?

>  it halts the
> sucker and fails with EBADMSG, while ep_aio_read() handles all write endpoints
> as isochronous ones - fails with EINVAL; FWIW, I agree that it's probably
> a bug]

It's not a bug; it's by design.  That's how you halt an endpoint in 
gadgetfs -- by doing a synchronous I/O call in the "wrong" direction.

> Sadly, that's not the only problem in there ;-/  _This_ one really has
> the "what if single-segment AIO read tries to dereference iovec after
> the caller is gone" bug you suspected in fs/direct-io.c; we have
> static void ep_user_copy_worker(struct work_struct *work)
> {
>         struct kiocb_priv *priv = container_of(work, struct kiocb_priv, work);
>         struct mm_struct *mm = priv->mm;
>         struct kiocb *iocb = priv->iocb;
>         size_t ret;
> 
>         use_mm(mm); 
>         ret = ep_copy_to_user(priv);
>         unuse_mm(mm);
> 
>         /* completing the iocb can drop the ctx and mm, don't touch mm after */
>         aio_complete(iocb, ret, ret);
> 
>         kfree(priv->buf);
>         kfree(priv);
> }
> called via schedule_work() from ->complete() of usb_request allocated and
> queued by ->aio_read().  It very definitely _can_ be executed after return
> from ->aio_read() and aio_run_iocb().  And ep_copy_to_user() dereferences
> the iovec given to ->aio_read(); _not_ its copy as f_fs.c does.
> 
> Do io_submit(2) with several IOCB_CMD_PREAD requests, and you'll almost
> certainly get the data from the first one copied to the destination of
> the second one instead.  It shouldn't be hard to reproduce.  And that,
> of course, is not the worst possible outcome...
> 
> I'm going to add copying of iovec in async read case.  And AFAICS, that one
> is -stable fodder.  See vfs.git#gadget for f_fs.c conversion; I haven't
> pushed legacy/inode.c stuff yet - I need to pull the fix of the bug above
> into the beginning of that pile first.
> 
> FWIW, I don't believe that it makes sense to do iovec copying in
> aio_run_iocb(); note that most of the instances will be done with
> iovec before they return there.

That's true even for gadgetfs in the write case.

>  These two were the sole exceptions;
> function/f_fs.c did copying, legacy/inode.c didn't.  Most of the
> ->aio_read/->read_iter instances (including ones that *do* return
> EIOCBQUEUED) only access iovec synchronously; usually that's done
> by grabbing the pages to copy into before we get aronud to starting
> IO.  legacy/inode.c is the only instance to step into that kind of bug.
> function/f_fs.c also had a fun bug, BTW - failure in AIO ended up leaking
> io_data (plus iovec copy in case of aio_read()).  Looks like another
> -stable fodder, if less serious one...  See b17d2ded6 (gadget/function/f_fs.c:
> close leaks) in vfs.git#gadget for that one.
> 
> I plan to pull the fix for use-after-free in the beginning of that queue
> (in an easy to backport form) and then have ep_aio_read/ep_aio_write
> start doing the halt-related bits as in ep_read/ep_write.  With that it's
> trivial to convert that sucker along the same lines as function/f_fs.c.

I don't think there's any need to make the async routines do the
halt-related stuff.  After all, it's silly for users to call an async
I/O routine to perform a synchronous action like halting an endpoint.

On the other hand, it would be reasonable to replace the -EBADMSG with
some massaged version of the return code from usb_ep_set_halt(), which
is supposed to return -EAGAIN under some circumstances.  But that would
be an API change, so we probably shouldn't do it...

> All of that, assuming that anybody gives a damn about the driver in question.
> The things like
>                         spin_lock_irq (&dev->lock);
> 			....
> // FIXME don't call this with the spinlock held ...
>                                 if (copy_to_user (buf, dev->req->buf, len))
> seem to indicate that nobody does, seeing that this bug had been there
> since 2003, complete with FIXME ;-/
> 
> If nobody cares about that sucker, git rm would be a better solution, IMO...

It is a legacy driver after all, but some people still use it.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html