Re: [PATCH] ceph: do not truncate pagecache if truncate size doesn't change

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2021-11-17 at 09:21 +0800, Xiubo Li wrote:
> On 11/17/21 4:06 AM, Jeff Layton wrote:
> > On Tue, 2021-11-16 at 17:20 +0800, xiubli@xxxxxxxxxx wrote:
> > > From: Xiubo Li <xiubli@xxxxxxxxxx>
> > > 
> > > In case truncating a file to a smaller sizeA, the sizeA will be kept
> > > in truncate_size. And if truncate the file to a bigger sizeB, the
> > > MDS will only increase the truncate_seq, but still using the sizeA as
> > > the truncate_size.
> > > 
> > Do you mean "kept in ci->i_truncate_size" ?
> 
> Sorry for confusing. It mainly will be kept in the MDS side's 
> CInode->inode.truncate_size. And also will be propagated to all the 
> clients' ci->i_truncate_size member.
> 
> The MDS will only change CInode->inode.truncate_size when truncating a 
> smaller size.
> 
> 
> > If so, is this really the
> > correct fix? I'll note this in the sources:
> > 
> >          u32 i_truncate_seq;        /* last truncate to smaller size */
> >          u64 i_truncate_size;       /*  and the size we last truncated down to */
> > 
> > Maybe the MDS ought not bump the truncate_seq unless it was truncating
> > to a smaller size? If not, then that comment seems wrong at least.
> 
> Yeah, the above comments are inconsistent with what the MDS is doing.
> 
> Okay, I missed reading the code, I found in MDS that is introduced by 
> commit :
> 
>       bf39d32d936 mds: bump truncate seq when fscrypt_file changes
> 
> With the size handling feature support, I think this commit will make no 
> sense any more since we will calculate the 'truncating_smaller' by not 
> only comparing the new_size and old_size, which both are rounded up to 
> FSCRYPT BLOCK SIZE, will also check the 'req->get_data().length()' if 
> the new_size and old_size are the same.
> 
> 
> > 
> > > So when filling the inode it will truncate the pagecache by using
> > > truncate_sizeA again, which makes no sense and will trim the inocent
> > > pages.
> > > 
> > Is there a reproducer for this? It would be nice to put something in
> > xfstests for it if so.
> 
> In xfstests' generic/075 has already testing this, but i didn't see any 
> issue it reproduce.
> 
> I just found this strange logs when it's doing 
> something like:
> 
> truncateA 0x10000 --> 0x2000
> 
> truncateB 0x2000   --> 0x8000
> 
> truncateC 0x8000   --> 0x6000
> 
> For the truncateC, the log says:
> 
> ceph:  truncate_size 0x2000 -> 0x6000
> 
> 
> The problem is that the truncateB will also do the vmtruncate by using 
> the 0x2000 instead, the vmtruncate will not flush the dirty pages to the 
> OSD and will just discard them from the pagecaches. Then we may lost 
> some new updated data in case there has any write before the truncateB 
> in range [0x2000, 0x8000).
> 
> 

Is that reproducible without the fscrypt size handling changes? I
haven't seen generic/075 fail on stock kernels.

If this is a generic bug, then we should go ahead and fix it in
mainline. If it's a problem only with fscrypt enabled, then let's plan
to roll this patch into those changes.

> 
> 
> > > Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
> > > ---
> > >   fs/ceph/inode.c | 5 +++--
> > >   1 file changed, 3 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> > > index 1b4ce453d397..b4f784684e64 100644
> > > --- a/fs/ceph/inode.c
> > > +++ b/fs/ceph/inode.c
> > > @@ -738,10 +738,11 @@ int ceph_fill_file_size(struct inode *inode, int issued,
> > >   			 * don't hold those caps, then we need to check whether
> > >   			 * the file is either opened or mmaped
> > >   			 */
> > > -			if ((issued & (CEPH_CAP_FILE_CACHE|
> > > +			if (ci->i_truncate_size != truncate_size &&
> > > +			    ((issued & (CEPH_CAP_FILE_CACHE|
> > >   				       CEPH_CAP_FILE_BUFFER)) ||
> > >   			    mapping_mapped(inode->i_mapping) ||
> > > -			    __ceph_is_file_opened(ci)) {
> > > +			    __ceph_is_file_opened(ci))) {
> > >   				ci->i_truncate_pending++;
> > >   				queue_trunc = 1;
> > >   			}
> 

-- 
Jeff Layton <jlayton@xxxxxxxxxx>



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux