On Sat, Oct 29, 2016 at 8:20 AM, Christoph Hellwig <hch@xxxxxx> wrote: > > We can't as that would not fix the use after free (at least for the lockdep > case - otherwise the call is a no-op). Once iter_op returns aio_complete > might have dropped our reference to the file, and another thread might > have closed the fd so that the fput from aio_complete was the last one. I don't concpetually mind the patch per se, but the repeated if (rw == WRITE) { .. } if (rw == WRITE) { .. } is just insane and makes the code less legible than it should be. Also, honestly, make it use a helper: "aio_file_start_write()" and "aio_file_end_write()" that has the comments and the lockdep games. Because that patch is just too effing ugly. Does something like the attached work for you guys? Linus
fs/aio.c | 33 +++++++++++++++++++++++++++++---- include/linux/fs.h | 1 + 2 files changed, 30 insertions(+), 4 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 1157e13a36d6..3f66331ef90c 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1066,6 +1066,27 @@ static struct kioctx *lookup_ioctx(unsigned long ctx_id) return ret; } +/* + * We do file_start_write/file_end_write() to make sure + * we have filesystem freeze protection over the whole + * AIO write sequence, but to make sure that lockdep does + * not complain about the held lock when we return to + * userspace, we tell it that we release and reaquire the + * lock. + */ +static void aio_file_start_write(struct file *file) +{ + file_start_write(file); + __sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE); +} + +static void aio_file_end_write(struct file *file) +{ + __sb_writers_acquired(file_inode(file)->i_sb, SB_FREEZE_WRITE); + file_end_write(file); +} + + /* aio_complete * Called when the io request on the given iocb is complete. */ @@ -1078,6 +1099,9 @@ static void aio_complete(struct kiocb *kiocb, long res, long res2) unsigned tail, pos, head; unsigned long flags; + if (kiocb->ki_flags & IOCB_WRITE) + aio_file_end_write(kiocb->ki_filp); + /* * Special case handling for sync iocbs: * - events go directly into the iocb for fast handling @@ -1460,13 +1484,14 @@ static ssize_t aio_run_iocb(struct kiocb *req, unsigned opcode, return ret; } - if (rw == WRITE) - file_start_write(file); + if (rw == WRITE) { + /* aio_complete() will end the write */ + req->ki_flags |= IOCB_WRITE; + aio_file_start_write(file); + } ret = iter_op(req, &iter); - if (rw == WRITE) - file_end_write(file); kfree(iovec); break; diff --git a/include/linux/fs.h b/include/linux/fs.h index 16d2b6e874d6..db600e9bb1b4 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -321,6 +321,7 @@ struct writeback_control; #define IOCB_HIPRI (1 << 3) #define IOCB_DSYNC (1 << 4) #define IOCB_SYNC (1 << 5) +#define IOCB_WRITE (1 << 6) struct kiocb { struct file *ki_filp;