On Mon, 2022-02-21 at 18:00 +0000, Pavel Begunkov wrote: > On 2/21/22 14:16, Dylan Yudaken wrote: > > In read/write ops, preincrement f_pos when no offset is specified, > > and > > then attempt fix up the position after IO completes if it completed > > less > > than expected. This fixes the problem where multiple queued up IO > > will all > > obtain the same f_pos, and so perform the same read/write. > > > > This is still not as consistent as sync r/w, as it is able to > > advance the > > file offset past the end of the file. It seems it would be quite a > > performance hit to work around this limitation - such as by keeping > > track > > of concurrent operations - and the downside does not seem to be too > > problematic. > > > > The attempt to fix up the f_pos after will at least mean that in > > situations > > where a single operation is run, then the position will be > > consistent. > > > > Co-developed-by: Jens Axboe <axboe@xxxxxxxxx> > > Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> > > Signed-off-by: Dylan Yudaken <dylany@xxxxxx> > > --- > > fs/io_uring.c | 81 ++++++++++++++++++++++++++++++++++++++++++---- > > ----- > > 1 file changed, 68 insertions(+), 13 deletions(-) > > > > diff --git a/fs/io_uring.c b/fs/io_uring.c > > index abd8c739988e..a951d0754899 100644 > > --- a/fs/io_uring.c > > +++ b/fs/io_uring.c > > @@ -3066,21 +3066,71 @@ static inline void io_rw_done(struct kiocb > > *kiocb, ssize_t ret) > > [...] > > > + return false; > > } > > } > > - return is_stream ? NULL : &kiocb->ki_pos; > > + *ppos = is_stream ? NULL : &kiocb->ki_pos; > > + return false; > > +} > > + > > +static inline void > > +io_kiocb_done_pos(struct io_kiocb *req, struct kiocb *kiocb, u64 > > actual) > > That's a lot of inlining, I wouldn't be surprised if the compiler > will even refuse to do that. > > io_kiocb_done_pos() { > // rest of it > } > > inline io_kiocb_done_pos() { > if (!(flags & CUR_POS)); > return; > __io_kiocb_done_pos(); > } > > io_kiocb_update_pos() is huge as well Good idea, will split the slower paths out. > > > +{ > > + u64 expected; > > + > > + if (likely(!(req->flags & REQ_F_CUR_POS))) > > + return; > > + > > + expected = req->rw.len; > > + if (actual >= expected) > > + return; > > + > > + /* > > + * It's not definitely safe to lock here, and the > > assumption is, > > + * that if we cannot lock the position that it will be > > changing, > > + * and if it will be changing - then we can't update it > > anyway > > + */ > > + if (req->file->f_mode & FMODE_ATOMIC_POS > > + && !mutex_trylock(&req->file->f_pos_lock)) > > + return; > > + > > + /* > > + * now we want to move the pointer, but only if everything > > is consistent > > + * with how we left it originally > > + */ > > + if (req->file->f_pos == kiocb->ki_pos + (expected - > > actual)) > > + req->file->f_pos = kiocb->ki_pos; > > I wonder, is it good enough / safe to just assign it considering that > the request was executed outside of locks? vfs_seek()? No I do not think so - in the case of multiple r/w the same thing will happen, even with no vfs_seek(). > > > + > > + /* else something else messed with f_pos and we can't do > > anything */ > > + > > + if (req->file->f_mode & FMODE_ATOMIC_POS) > > + mutex_unlock(&req->file->f_pos_lock); > > } > > Do we even care about races while reading it? E.g. > pos = READ_ONCE(); I think so - if I remove all the locks the test cases fail. > > > > > - ppos = io_kiocb_update_pos(req, kiocb); > > - > > ret = rw_verify_area(READ, req->file, ppos, req->result); > > if (unlikely(ret)) { > > kfree(iovec); > > + io_kiocb_done_pos(req, kiocb, 0); > > Why do we update it on failure? > > [...] > > > - ppos = io_kiocb_update_pos(req, kiocb); > > - > > ret = rw_verify_area(WRITE, req->file, ppos, req->result); > > if (unlikely(ret)) > > goto out_free; > > @@ -3858,6 +3912,7 @@ static int io_write(struct io_kiocb *req, > > unsigned int issue_flags) > > return ret ?: -EAGAIN; > > } > > out_free: > > + io_kiocb_done_pos(req, kiocb, 0); > > Looks weird. It appears we don't need it on failure and > successes are covered by kiocb_done() / ->ki_complete > > > /* it's reportedly faster than delegating the null check to > > kfree() */ > > if (iovec) > > kfree(iovec); >