Jeff Layton <jlayton@xxxxxxxxxx> writes: > On Wed, 2020-11-11 at 15:39 +0000, Luis Henriques wrote: >> When doing a rename across quota realms, there's a corner case that isn't >> handled correctly. Here's a testcase: >> >> mkdir files limit >> truncate files/file -s 10G >> setfattr limit -n ceph.quota.max_bytes -v 1000000 >> mv files limit/ >> >> The above will succeed because ftruncate(2) won't result in an immediate >> notification of the MDSs with the new file size, and thus the quota realms >> stats won't be updated. >> >> This patch forces a sync with the MDS every time there's an ATTR_SIZE that >> sets a new i_size, even if we have Fx caps. >> >> Cc: stable@xxxxxxxxxxxxxxx >> Fixes: dffdcd71458e ("ceph: allow rename operation under different quota realms") >> URL: https://tracker.ceph.com/issues/36593 >> Signed-off-by: Luis Henriques <lhenriques@xxxxxxx> >> --- >> fs/ceph/inode.c | 11 ++--------- >> 1 file changed, 2 insertions(+), 9 deletions(-) >> >> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c >> index 526faf4778ce..30e3f240ac96 100644 >> --- a/fs/ceph/inode.c >> +++ b/fs/ceph/inode.c >> @@ -2136,15 +2136,8 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr) >> if (ia_valid & ATTR_SIZE) { >> dout("setattr %p size %lld -> %lld\n", inode, >> inode->i_size, attr->ia_size); >> - if ((issued & CEPH_CAP_FILE_EXCL) && >> - attr->ia_size > inode->i_size) { >> - i_size_write(inode, attr->ia_size); >> - inode->i_blocks = calc_inode_blocks(attr->ia_size); >> - ci->i_reported_size = attr->ia_size; >> - dirtied |= CEPH_CAP_FILE_EXCL; >> - ia_valid |= ATTR_MTIME; >> - } else if ((issued & CEPH_CAP_FILE_SHARED) == 0 || >> - attr->ia_size != inode->i_size) { >> + if ((issued & (CEPH_CAP_FILE_EXCL|CEPH_CAP_FILE_SHARED)) || >> + (attr->ia_size != inode->i_size)) { >> req->r_args.setattr.size = cpu_to_le64(attr->ia_size); >> req->r_args.setattr.old_size = >> cpu_to_le64(inode->i_size); > > Hmm...this makes truncates more expensive when we have caps. I'd rather > not do that if we can help it. Yeah, as I mentioned in the tracker, there's indeed a performance impact with this fix. That's what made me add the RFC in the subject ;-) > What about instead having the client mimic a fsync when there is a > rename across quota realms? If we can't tell that reliably then we could > also just do an effective fsync ahead of any cross-directory rename? Ok, thanks for the suggestion. That may actually work, although it will make the rename more expensive of course. I'll test that tomorrow and eventually follow-up with a patch. Cheers, -- Luis