On Wed, 2021-08-11 at 15:08 +0100, Luis Henriques wrote: > Jeff Layton <jlayton@xxxxxxxxxx> writes: > > > The current code will update the mtime and then try to get caps to > > handle the write. If we end up having to request caps from the MDS, then > > the mtime in the cap grant will clobber the updated mtime and it'll be > > lost. > > > > This is most noticable when two clients are alternately writing to the > > same file. Fw caps are continually being granted and revoked, and the > > mtime ends up stuck because the updated mtimes are always being > > overwritten with the old one. > > > > Fix this by changing the order of operations in ceph_write_iter. Get the > > caps much earlier, and only update the times afterward. Also, make sure > > we check the NEARFULL conditions before making any changes to the inode. > > > > URL: https://tracker.ceph.com/issues/46574 > > Reported-by: Jozef Kováč <kovac@xxxxxxxxxxxxxxx> > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > > --- > > fs/ceph/file.c | 34 +++++++++++++++++----------------- > > 1 file changed, 17 insertions(+), 17 deletions(-) > > > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c > > index f55ca2c4c7de..5867acfc6a51 100644 > > --- a/fs/ceph/file.c > > +++ b/fs/ceph/file.c > > @@ -1722,22 +1722,6 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from) > > goto out; > > } > > > > - err = file_remove_privs(file); > > - if (err) > > - goto out; > > - > > - err = file_update_time(file); > > - if (err) > > - goto out; > > - > > - inode_inc_iversion_raw(inode); > > - > > - if (ci->i_inline_version != CEPH_INLINE_NONE) { > > - err = ceph_uninline_data(file, NULL); > > - if (err < 0) > > - goto out; > > - } > > - > > down_read(&osdc->lock); > > map_flags = osdc->osdmap->flags; > > pool_flags = ceph_pg_pool_flags(osdc->osdmap, ci->i_layout.pool_id); > > @@ -1748,6 +1732,12 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from) > > goto out; > > } > > > > + if (ci->i_inline_version != CEPH_INLINE_NONE) { > > + err = ceph_uninline_data(file, NULL); > > + if (err < 0) > > + goto out; > > + } > > + > > dout("aio_write %p %llx.%llx %llu~%zd getting caps. i_size %llu\n", > > inode, ceph_vinop(inode), pos, count, i_size_read(inode)); > > if (fi->fmode & CEPH_FILE_MODE_LAZY) > > @@ -1759,6 +1749,16 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from) > > if (err < 0) > > goto out; > > > > + err = file_remove_privs(file); > > + if (err) > > + goto out_caps; > > + > > + err = file_update_time(file); > > + if (err) > > + goto out_caps; > > Unless I'm missing something (which happens quite frequently!) i_rwsem > still needs to be released through either ceph_end_io_write() or > ceph_end_io_direct(). And this isn't being done if we jump to out_caps > (yeah, goto's spaghetti fun). > Good catch! I'll send a v2 in a bit after I test it. > Also, this patch is probably worth adding to stable@ too, although I > haven't checked how easy is it to cherry-pick to older kernel versions. > I'm not sure it qualifies for stable. We do have an open tracker bug for it, but the only real problem is that the mtime/change_attr stall out while there is competing I/O. Definitely broken, but I'm not sure it's really affecting that many people. -- Jeff Layton <jlayton@xxxxxxxxxx>