Re: [PATCH 3/5] fs: remove ki_nbytes

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Wed, 4 Feb 2015 08:34:07 +0000

[USB folks Cc'd]

On Mon, Feb 02, 2015 at 03:26:17PM +0100, Christoph Hellwig wrote:

> I would bet the behavior difference is a bug, might be worth to Cc the
> usb folks on this issue.  I bet we'd want the more complex behavior
> for both variants.

[Context for USB people: The difference in question is what ep_read() does
when it is called on write endpoint that isn't isochronous; it halts the
sucker and fails with EBADMSG, while ep_aio_read() handles all write endpoints
as isochronous ones - fails with EINVAL; FWIW, I agree that it's probably
a bug]

Sadly, that's not the only problem in there ;-/  _This_ one really has
the "what if single-segment AIO read tries to dereference iovec after
the caller is gone" bug you suspected in fs/direct-io.c; we have
static void ep_user_copy_worker(struct work_struct *work)
{
        struct kiocb_priv *priv = container_of(work, struct kiocb_priv, work);
        struct mm_struct *mm = priv->mm;
        struct kiocb *iocb = priv->iocb;
        size_t ret;

        use_mm(mm); 
        ret = ep_copy_to_user(priv);
        unuse_mm(mm);

        /* completing the iocb can drop the ctx and mm, don't touch mm after */
        aio_complete(iocb, ret, ret);

        kfree(priv->buf);
        kfree(priv);
}
called via schedule_work() from ->complete() of usb_request allocated and
queued by ->aio_read().  It very definitely _can_ be executed after return
from ->aio_read() and aio_run_iocb().  And ep_copy_to_user() dereferences
the iovec given to ->aio_read(); _not_ its copy as f_fs.c does.

Do io_submit(2) with several IOCB_CMD_PREAD requests, and you'll almost
certainly get the data from the first one copied to the destination of
the second one instead.  It shouldn't be hard to reproduce.  And that,
of course, is not the worst possible outcome...

I'm going to add copying of iovec in async read case.  And AFAICS, that one
is -stable fodder.  See vfs.git#gadget for f_fs.c conversion; I haven't
pushed legacy/inode.c stuff yet - I need to pull the fix of the bug above
into the beginning of that pile first.

FWIW, I don't believe that it makes sense to do iovec copying in
aio_run_iocb(); note that most of the instances will be done with
iovec before they return there.  These two were the sole exceptions;
function/f_fs.c did copying, legacy/inode.c didn't.  Most of the
->aio_read/->read_iter instances (including ones that *do* return
EIOCBQUEUED) only access iovec synchronously; usually that's done
by grabbing the pages to copy into before we get aronud to starting
IO.  legacy/inode.c is the only instance to step into that kind of bug.
function/f_fs.c also had a fun bug, BTW - failure in AIO ended up leaking
io_data (plus iovec copy in case of aio_read()).  Looks like another
-stable fodder, if less serious one...  See b17d2ded6 (gadget/function/f_fs.c:
close leaks) in vfs.git#gadget for that one.

I plan to pull the fix for use-after-free in the beginning of that queue
(in an easy to backport form) and then have ep_aio_read/ep_aio_write
start doing the halt-related bits as in ep_read/ep_write.  With that it's
trivial to convert that sucker along the same lines as function/f_fs.c.

All of that, assuming that anybody gives a damn about the driver in question.
The things like
                        spin_lock_irq (&dev->lock);
			....
// FIXME don't call this with the spinlock held ...
                                if (copy_to_user (buf, dev->req->buf, len))
seem to indicate that nobody does, seeing that this bug had been there
since 2003, complete with FIXME ;-/

If nobody cares about that sucker, git rm would be a better solution, IMO...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html