On Wed, 2021-11-17 at 09:21 +0800, Xiubo Li wrote: > On 11/17/21 4:06 AM, Jeff Layton wrote: > > On Tue, 2021-11-16 at 17:20 +0800, xiubli@xxxxxxxxxx wrote: > > > From: Xiubo Li <xiubli@xxxxxxxxxx> > > > > > > In case truncating a file to a smaller sizeA, the sizeA will be kept > > > in truncate_size. And if truncate the file to a bigger sizeB, the > > > MDS will only increase the truncate_seq, but still using the sizeA as > > > the truncate_size. > > > > > Do you mean "kept in ci->i_truncate_size" ? > > Sorry for confusing. It mainly will be kept in the MDS side's > CInode->inode.truncate_size. And also will be propagated to all the > clients' ci->i_truncate_size member. > > The MDS will only change CInode->inode.truncate_size when truncating a > smaller size. > > > > If so, is this really the > > correct fix? I'll note this in the sources: > > > > u32 i_truncate_seq; /* last truncate to smaller size */ > > u64 i_truncate_size; /* and the size we last truncated down to */ > > > > Maybe the MDS ought not bump the truncate_seq unless it was truncating > > to a smaller size? If not, then that comment seems wrong at least. > > Yeah, the above comments are inconsistent with what the MDS is doing. > > Okay, I missed reading the code, I found in MDS that is introduced by > commit : > > bf39d32d936 mds: bump truncate seq when fscrypt_file changes > > With the size handling feature support, I think this commit will make no > sense any more since we will calculate the 'truncating_smaller' by not > only comparing the new_size and old_size, which both are rounded up to > FSCRYPT BLOCK SIZE, will also check the 'req->get_data().length()' if > the new_size and old_size are the same. > > > > > > > So when filling the inode it will truncate the pagecache by using > > > truncate_sizeA again, which makes no sense and will trim the inocent > > > pages. > > > > > Is there a reproducer for this? It would be nice to put something in > > xfstests for it if so. > > In xfstests' generic/075 has already testing this, but i didn't see any > issue it reproduce. > > I just found this strange logs when it's doing > something like: > > truncateA 0x10000 --> 0x2000 > > truncateB 0x2000 --> 0x8000 > > truncateC 0x8000 --> 0x6000 > > For the truncateC, the log says: > > ceph: truncate_size 0x2000 -> 0x6000 > > > The problem is that the truncateB will also do the vmtruncate by using > the 0x2000 instead, the vmtruncate will not flush the dirty pages to the > OSD and will just discard them from the pagecaches. Then we may lost > some new updated data in case there has any write before the truncateB > in range [0x2000, 0x8000). > > Is that reproducible without the fscrypt size handling changes? I haven't seen generic/075 fail on stock kernels. If this is a generic bug, then we should go ahead and fix it in mainline. If it's a problem only with fscrypt enabled, then let's plan to roll this patch into those changes. > > > > > Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx> > > > --- > > > fs/ceph/inode.c | 5 +++-- > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c > > > index 1b4ce453d397..b4f784684e64 100644 > > > --- a/fs/ceph/inode.c > > > +++ b/fs/ceph/inode.c > > > @@ -738,10 +738,11 @@ int ceph_fill_file_size(struct inode *inode, int issued, > > > * don't hold those caps, then we need to check whether > > > * the file is either opened or mmaped > > > */ > > > - if ((issued & (CEPH_CAP_FILE_CACHE| > > > + if (ci->i_truncate_size != truncate_size && > > > + ((issued & (CEPH_CAP_FILE_CACHE| > > > CEPH_CAP_FILE_BUFFER)) || > > > mapping_mapped(inode->i_mapping) || > > > - __ceph_is_file_opened(ci)) { > > > + __ceph_is_file_opened(ci))) { > > > ci->i_truncate_pending++; > > > queue_trunc = 1; > > > } > -- Jeff Layton <jlayton@xxxxxxxxxx>