Re: [PATCH] ceph: fscrypt always set the header.block_size to CEPH_FSCRYPT_BLOCK_SIZE

Xiubo Li <xiubli@xxxxxxxxxx> · Wed, 24 Nov 2021 11:00:05 +0800

On 11/23/21 10:27 PM, Jeff Layton wrote:
On Tue, 2021-11-23 at 18:20 +0800, xiubli@xxxxxxxxxx wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>

When hit a file hole, will keep the header.assert_ver as 0, and
in MDS side it will check it to decide whether should it do a
RMW.

And always set the header.block_size to CEPH_FSCRYPT_BLOCK_SIZE,
because even in the hole case, the MDS will need to use this to
do the filer.truncate().

Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
---


Hi Jeff,

Please squash this patch to the previous "ceph: add truncate size handling support for fscrypt" commit in ceph-client/wip-fscrypt-size branch.

And also please sync the ceph PR, I have updated it too.



Thanks Xiubo. The branches should be updated now. Note that I dropped
this patch since it was causing regressions for me:

     [PATCH] ceph: do not truncate pagecache if truncate size doesn't change

Yeah, sure.


With a kernel from the current wip-fscrypt-size branch, and a ceph
cluster based on your fsize_support branch, I still got a failure on
generic/029:

generic/029 8s ... - output mismatch (see /home/jlayton/git/xfstests-dev/results//generic/029.out.bad)
     --- tests/generic/029.out	2020-11-09 14:47:52.488429897 -0500
     +++ /home/jlayton/git/xfstests-dev/results//generic/029.out.bad	2021-11-23 09:07:18.121501769 -0500
     @@ -24,13 +24,6 @@
      *
      00001400
      ==== Post-Remount ==
     -00000000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58  |XXXXXXXXXXXXXXXX|
     -*
     -00000400  57 57 57 57 57 57 57 57  57 57 57 57 57 57 57 57  |WWWWWWWWWWWWWWWW|
     -*
     ...
     (Run 'diff -u /home/jlayton/git/xfstests-dev/tests/generic/029.out /home/jlayton/git/xfstests-dev/results//generic/029.out.bad'  to see the entire diff)


Basically, after the second xfs_io test in generic/029, the entire file
was zeroed out. I still am unable to reproduce this every time however.
It mostly seems to occur when I do a full -g quick run. If I run
generic/029 in a loop, it rarely fails.

Here's the test, basically:

# second case is to do a mwrite between the truncate to a block on the
# same page we are truncating within the EOF. This checks that a mapped
# write between truncate down and truncate up a further mapped
# write to the same page into the new space doesn't result in data being lost.
$XFS_IO_PROG -t -f \
-c "truncate 5120"              `# truncate     |                        |` \
-c "pwrite -S 0x58 0 5120"      `# write        |XXXXXXXXXXXXXXXXXXXXXXXX|` \
-c "mmap -rw 0 5120"            `# mmap         |                        |` \
-c "mwrite -S 0x5a 2048 3072"   `# mwrite       |          ZZZZZZZZZZZZZZ|` \
-c "truncate 2048"              `# truncate dn  |         |` \
-c "mwrite -S 0x57 1024 1024"   `# mwrite       |     WWWWW              |` \
-c "truncate 5120"              `# truncate up  |                        |` \
-c "mwrite -S 0x59 2048 3072"   `# mwrite       |          YYYYYYYYYYYYYY|` \
-c "close"      \
$testfile | _filter_xfs_io
  
echo "==== Pre-Remount ==="
hexdump -C $testfile
_scratch_cycle_mount
echo "==== Post-Remount =="
hexdump -C $testfile


I'll keep playing with this today, and see if I can get closer to a
reliable reproducer.

Hi Jeff,

From above logs, it seems saying that the file i_size was reset to 0 
after doing the umount.

In the "handle_cap_grant()" function:

3351 static void handle_cap_grant(struct inode *inode,
3352                              struct ceph_mds_session *session,
3353                              struct ceph_cap *cap,
3354                              struct ceph_mds_caps *grant,
3355                              struct ceph_buffer *xattr_buf,
3356                              struct cap_extra_info *extra_info)
3357         __releases(ci->i_ceph_lock)
3358         __releases(session->s_mdsc->snap_rwsem)
3359 {

...
3375         /*
3376          * If there is at least one crypto block then we'll trust 
fscrypt_file_size.
3377          * If the real length of the file is 0, then ignore it (it 
has probably been
3378          * truncated down to 0 by the MDS).
3379          */
3380         if (IS_ENCRYPTED(inode) && size)
3381                 size = extra_info->fscrypt_file_size;
3382

In Line#3381 it will set the size to fscrypt_file_size, IMO it's 0 in 
some cases.

Checked the MDS code, I didn't find the CInode::encode_cap_message() is 
sending this to kclients in MClientCaps messages. So the 
extra_info->fscrypt_file_size should be 0.

Then in the ceph_fill_file_size() it will update the i_size with 0 in 
case of truncating.

Locally I couldn't reproduce it, could you pull the latest ceph PR and 
retry it ? I have appended one new commit to fix it:

diff --git a/src/mds/CInode.cc b/src/mds/CInode.cc
index eefc6805505..165fd10de74 100644
--- a/src/mds/CInode.cc
+++ b/src/mds/CInode.cc
@@ -4139,6 +4139,7 @@ void CInode::encode_cap_message(const 
ref_t<MClientCaps> &m, Capability *cap)
   m->size = i->size;
   m->truncate_seq = i->truncate_seq;
   m->truncate_size = i->truncate_size;
+  m->fscrypt_file = pi->fscrypt_file;
   m->mtime = i->mtime;
   m->atime = i->atime;
   m->ctime = i->ctime;


Thanks

BRs

-- Xiubo




  fs/ceph/inode.c | 14 ++++++++++++--
  1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 53b8e2ff3678..b4f7a4b4f15c 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2312,6 +2312,12 @@ static int fill_fscrypt_truncate(struct inode *inode,
  	header.ver = 1;
  	header.compat = 1;
  
+	/*
+	 * Always set the block_size to CEPH_FSCRYPT_BLOCK_SIZE,
+	 * because in MDS it may need this to do the truncate.
+	 */
+	header.block_size = cpu_to_le32(CEPH_FSCRYPT_BLOCK_SIZE);
+
  	/*
  	 * If we hit a hole here, we should just skip filling
  	 * the fscrypt for the request, because once the fscrypt
@@ -2327,15 +2333,19 @@ static int fill_fscrypt_truncate(struct inode *inode,
  		     pos, i_size);
  
  		header.data_len = cpu_to_le32(8 + 8 + 4);
+
+		/*
+		 * If the "assert_ver" is 0 means hitting a hole, and
+		 * the MDS will use the it to check whether hitting a
+		 * hole or not.
+		 */
  		header.assert_ver = 0;
  		header.file_offset = 0;
-		header.block_size = 0;
  		ret = 0;
  	} else {
  		header.data_len = cpu_to_le32(8 + 8 + 4 + CEPH_FSCRYPT_BLOCK_SIZE);
  		header.assert_ver = cpu_to_le64(objvers.objvers[0].objver);
  		header.file_offset = cpu_to_le64(orig_pos);
-		header.block_size = cpu_to_le32(CEPH_FSCRYPT_BLOCK_SIZE);
  
  		/* truncate and zero out the extra contents for the last block */
  		memset(iov.iov_base + boff, 0, PAGE_SIZE - boff);