Re: [PATCH v5 0/8] ceph: size handling for the fscrypt

Xiubo Li <xiubli@xxxxxxxxxx> · Thu, 4 Nov 2021 11:24:16 +0800

On 11/3/21 8:56 PM, Jeff Layton wrote:
On Wed, 2021-11-03 at 09:22 +0800, xiubli@xxxxxxxxxx wrote:
From: Jeff Layton <jlayton@xxxxxxxxxx>

This patch series is based on the "wip-fscrypt-fnames" branch in
repo https://github.com/ceph/ceph-client.git.

And I have picked up 5 patches from the "ceph-fscrypt-size-experimental"
branch in repo
https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git.

====

This approach is based on the discussion from V1 and V2, which will
pass the encrypted last block contents to MDS along with the truncate
request.

This will send the encrypted last block contents to MDS along with
the truncate request when truncating to a smaller size and at the
same time new size does not align to BLOCK SIZE.

The MDS side patch is raised in PR
https://github.com/ceph/ceph/pull/43588, which is also based Jeff's
previous great work in PR https://github.com/ceph/ceph/pull/41284.

The MDS will use the filer.write_trunc(), which could update and
truncate the file in one shot, instead of filer.truncate().

This just assume kclient won't support the inline data feature, which
will be remove soon, more detail please see:
https://tracker.ceph.com/issues/52916

Changed in V5:
- Rebase to "wip-fscrypt-fnames" branch in ceph-client.git repo.
- Pick up 5 patches from Jeff's "ceph-fscrypt-size-experimental" branch
   in linux.git repo.
- Add "i_truncate_pagecache_size" member support in ceph_inode_info
   struct, this will be used to truncate the pagecache only in kclient
   side, because the "i_truncate_size" will always be aligned to BLOCK
   SIZE. In fscrypt case we need to use the real size to truncate the
   pagecache.

Changed in V4:
- Retry the truncate request by 20 times before fail it with -EAGAIN.
- Remove the "fill_last_block" label and move the code to else branch.
- Remove the #3 patch, which has already been sent out separately, in
   V3 series.
- Improve some comments in the code.

Changed in V3:
- Fix possibly corrupting the file just before the MDS acquires the
   xlock for FILE lock, another client has updated it.
- Flush the pagecache buffer before reading the last block for the
   when filling the truncate request.
- Some other minore fixes.

Jeff Layton (5):
   libceph: add CEPH_OSD_OP_ASSERT_VER support
   ceph: size handling for encrypted inodes in cap updates
   ceph: fscrypt_file field handling in MClientRequest messages
   ceph: get file size from fscrypt_file when present in inode traces
   ceph: handle fscrypt fields in cap messages from MDS

Xiubo Li (3):
   ceph: add __ceph_get_caps helper support
   ceph: add __ceph_sync_read helper support
   ceph: add truncate size handling support for fscrypt

  fs/ceph/caps.c                  | 136 ++++++++++++++----
  fs/ceph/crypto.h                |   4 +
  fs/ceph/dir.c                   |   3 +
  fs/ceph/file.c                  |  43 ++++--
  fs/ceph/inode.c                 | 236 +++++++++++++++++++++++++++++---
  fs/ceph/mds_client.c            |   9 +-
  fs/ceph/mds_client.h            |   2 +
  fs/ceph/super.h                 |  10 ++
  include/linux/ceph/crypto.h     |  28 ++++
  include/linux/ceph/osd_client.h |   6 +-
  include/linux/ceph/rados.h      |   4 +
  net/ceph/osd_client.c           |   5 +
  12 files changed, 427 insertions(+), 59 deletions(-)
  create mode 100644 include/linux/ceph/crypto.h

Thanks Xiubo,

This looks like a great start. I set up an environment vs. a cephadm
cluster with your fscrypt changes, and started running xfstests against
it with test_dummy_encryption enabled. It got to generic/014 and the
test hung waiting on a SETATTR call to come back:

[root@client1 f3cf8b7a-38ec-11ec-a0e4-52540031ba78.client74208]# cat mdsc
89447	mds0	setattr	 #1000003b19c

Looking at the MDS that it was talking to, I see:

Nov 03 08:25:09 cephadm2 ceph-mds[3133]: log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 31.627241 secs
Nov 03 08:25:09 cephadm2 ceph-mds[3133]: log_channel(cluster) log [WRN] : slow request 31.627240 seconds old, received at 2021-11-03T12:24:37.911553+0000: client_request(client.74208:89447 setattr size=102498304 #0x1000003b19c 2021-11-03T12:24:37.895292+0000 caller_uid=0, caller_gid=0{0,}) currently acquired locks
Nov 03 08:25:14 cephadm2 ceph-mds[3133]: log_channel(cluster) log [WRN] : 1 slow requests, 0 included below; oldest blocked for > 36.627323 secs
Nov 03 08:25:19 cephadm2 ceph-mds[3133]: log_channel(cluster) log [WRN] : 1 slow requests, 0 included below; oldest blocked for > 41.627389 secs

...and it still hasn't resolved.

I'll keep looking around a bit more, but I think there are still some
bugs in here. Let me know if you have thoughts as to what the issue is.

From MDS side log, it keeps retrying the truncate request:

2021-11-04T10:24:25.542+0800 149d48288700  1 -- 
v1:10.72.47.117:6814/424105754 <== osd.0 v1:10.72.47.117:6800/10035 
249354 ==== osd_op_reply(358495 10000000ed7.00000016 [read 92872704~8] 
v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 164+0+0 
(unknown 4045992944 0 0) 0x55cd75169440 con 0x55cd7514dc00
2021-11-04T10:24:25.542+0800 149d46278700 10 MDSIOContextBase::complete: 
24C_IO_MDC_ReadtruncFinish
2021-11-04T10:24:25.542+0800 149d46278700 10 MDSContext::complete: 
24C_IO_MDC_ReadtruncFinish

It's a bug when hit a file hole. I will fix it soon.

Thanks.

BRs

Thanks,