[USB folks Cc'd] On Mon, Feb 02, 2015 at 03:26:17PM +0100, Christoph Hellwig wrote: > I would bet the behavior difference is a bug, might be worth to Cc the > usb folks on this issue. I bet we'd want the more complex behavior > for both variants. [Context for USB people: The difference in question is what ep_read() does when it is called on write endpoint that isn't isochronous; it halts the sucker and fails with EBADMSG, while ep_aio_read() handles all write endpoints as isochronous ones - fails with EINVAL; FWIW, I agree that it's probably a bug] Sadly, that's not the only problem in there ;-/ _This_ one really has the "what if single-segment AIO read tries to dereference iovec after the caller is gone" bug you suspected in fs/direct-io.c; we have static void ep_user_copy_worker(struct work_struct *work) { struct kiocb_priv *priv = container_of(work, struct kiocb_priv, work); struct mm_struct *mm = priv->mm; struct kiocb *iocb = priv->iocb; size_t ret; use_mm(mm); ret = ep_copy_to_user(priv); unuse_mm(mm); /* completing the iocb can drop the ctx and mm, don't touch mm after */ aio_complete(iocb, ret, ret); kfree(priv->buf); kfree(priv); } called via schedule_work() from ->complete() of usb_request allocated and queued by ->aio_read(). It very definitely _can_ be executed after return from ->aio_read() and aio_run_iocb(). And ep_copy_to_user() dereferences the iovec given to ->aio_read(); _not_ its copy as f_fs.c does. Do io_submit(2) with several IOCB_CMD_PREAD requests, and you'll almost certainly get the data from the first one copied to the destination of the second one instead. It shouldn't be hard to reproduce. And that, of course, is not the worst possible outcome... I'm going to add copying of iovec in async read case. And AFAICS, that one is -stable fodder. See vfs.git#gadget for f_fs.c conversion; I haven't pushed legacy/inode.c stuff yet - I need to pull the fix of the bug above into the beginning of that pile first. FWIW, I don't believe that it makes sense to do iovec copying in aio_run_iocb(); note that most of the instances will be done with iovec before they return there. These two were the sole exceptions; function/f_fs.c did copying, legacy/inode.c didn't. Most of the ->aio_read/->read_iter instances (including ones that *do* return EIOCBQUEUED) only access iovec synchronously; usually that's done by grabbing the pages to copy into before we get aronud to starting IO. legacy/inode.c is the only instance to step into that kind of bug. function/f_fs.c also had a fun bug, BTW - failure in AIO ended up leaking io_data (plus iovec copy in case of aio_read()). Looks like another -stable fodder, if less serious one... See b17d2ded6 (gadget/function/f_fs.c: close leaks) in vfs.git#gadget for that one. I plan to pull the fix for use-after-free in the beginning of that queue (in an easy to backport form) and then have ep_aio_read/ep_aio_write start doing the halt-related bits as in ep_read/ep_write. With that it's trivial to convert that sucker along the same lines as function/f_fs.c. All of that, assuming that anybody gives a damn about the driver in question. The things like spin_lock_irq (&dev->lock); .... // FIXME don't call this with the spinlock held ... if (copy_to_user (buf, dev->req->buf, len)) seem to indicate that nobody does, seeing that this bug had been there since 2003, complete with FIXME ;-/ If nobody cares about that sucker, git rm would be a better solution, IMO... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html