On Fri, Nov 17, 2023 at 9:46 PM Josef Bacik <josef@xxxxxxxxxxxxxx> wrote: > > On Tue, Nov 14, 2023 at 05:32:51PM +0200, Amir Goldstein wrote: > > In vfs code, sb_start_write() is usually called after the permission hook > > in rw_verify_area(). vfs_iocb_iter_write() is an exception to this rule, > > where kiocb_start_write() is called by its callers. > > > > Move kiocb_start_write() from the callers into vfs_iocb_iter_write() > > after the rw_verify_area() checks, to make them "start-write-safe". > > > > This is needed for fanotify "pre content" events. > > > > Suggested-by: Jan Kara <jack@xxxxxxx> > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> > > --- > > fs/cachefiles/io.c | 2 -- > > fs/overlayfs/file.c | 1 - > > fs/read_write.c | 2 ++ > > 3 files changed, 2 insertions(+), 3 deletions(-) > > > > diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c > > index 009d23cd435b..3d3667807636 100644 > > --- a/fs/cachefiles/io.c > > +++ b/fs/cachefiles/io.c > > @@ -319,8 +319,6 @@ int __cachefiles_write(struct cachefiles_object *object, > > ki->iocb.ki_complete = cachefiles_write_complete; > > atomic_long_add(ki->b_writing, &cache->b_writing); > > > > - kiocb_start_write(&ki->iocb); > > - > > This bit is subtly wrong, there's a little bit below that does > > ret = cachefiles_inject_write_error(); > if (ret == 0) > ret = vfs_iocb_iter_write(file, &ki->iocb, iter); > > If cachefiles_inject_write_error() returns non-zero it'll fallthrough below and > call cachefiles_write_complete() and complete the kiocb that hasn't started yet. Ouch! good catch! My initial proposal for kiocb_{start,end}_write() has an internal IOCB_WRITE_STARTED flag to protect against this sort of error, but Jens convinced me to remove it. In io_uring and aio.c, IOCB_WRITE is set only after kiocb_start_write(), so kiocb_end_write() is effectively gated by the IOCB_WRITE flag. I could make this change in cachefiles/io.c to use IOCB_WRITE flag as an indication for kiocb_start_write(). It's a bit ugly, but I can't think of anything better and maybe for the purpose of error injection, the hack is not so bad? Thanks, Amir. --- a/fs/cachefiles/io.c +++ b/fs/cachefiles/io.c @@ -259,7 +259,8 @@ static void cachefiles_write_complete(struct kiocb *iocb, long ret) _enter("%ld", ret); - kiocb_end_write(iocb); + if (iocb->ki_flags & IOCB_WRITE) + kiocb_end_write(iocb); if (ret < 0) trace_cachefiles_io_error(object, inode, ret, @@ -305,7 +306,7 @@ int __cachefiles_write(struct cachefiles_object *object, refcount_set(&ki->ki_refcnt, 2); ki->iocb.ki_filp = file; ki->iocb.ki_pos = start_pos; - ki->iocb.ki_flags = IOCB_DIRECT | IOCB_WRITE; + ki->iocb.ki_flags = IOCB_DIRECT; ki->iocb.ki_ioprio = get_current_ioprio(); ki->object = object; ki->start = start_pos; @@ -319,16 +320,17 @@ int __cachefiles_write(struct cachefiles_object *object, ki->iocb.ki_complete = cachefiles_write_complete; atomic_long_add(ki->b_writing, &cache->b_writing); - kiocb_start_write(&ki->iocb); - get_file(ki->iocb.ki_filp); cachefiles_grab_object(object, cachefiles_obj_get_ioreq); trace_cachefiles_write(object, file_inode(file), ki->iocb.ki_pos, len); old_nofs = memalloc_nofs_save(); ret = cachefiles_inject_write_error(); - if (ret == 0) + if (ret == 0) { + kiocb_start_write(&ki->iocb); + ki->iocb.ki_flags |= IOCB_WRITE; ret = vfs_iocb_iter_write(file, &ki->iocb, iter); + } memalloc_nofs_restore(old_nofs); switch (ret) { case -EIOCBQUEUED: