On Wed, 2021-11-03 at 09:22 +0800, xiubli@xxxxxxxxxx wrote: > From: Jeff Layton <jlayton@xxxxxxxxxx> > > This patch series is based on the "wip-fscrypt-fnames" branch in > repo https://github.com/ceph/ceph-client.git. > > And I have picked up 5 patches from the "ceph-fscrypt-size-experimental" > branch in repo > https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git. > > ==== > > This approach is based on the discussion from V1 and V2, which will > pass the encrypted last block contents to MDS along with the truncate > request. > > This will send the encrypted last block contents to MDS along with > the truncate request when truncating to a smaller size and at the > same time new size does not align to BLOCK SIZE. > > The MDS side patch is raised in PR > https://github.com/ceph/ceph/pull/43588, which is also based Jeff's > previous great work in PR https://github.com/ceph/ceph/pull/41284. > > The MDS will use the filer.write_trunc(), which could update and > truncate the file in one shot, instead of filer.truncate(). > > This just assume kclient won't support the inline data feature, which > will be remove soon, more detail please see: > https://tracker.ceph.com/issues/52916 > > Changed in V5: > - Rebase to "wip-fscrypt-fnames" branch in ceph-client.git repo. > - Pick up 5 patches from Jeff's "ceph-fscrypt-size-experimental" branch > in linux.git repo. > - Add "i_truncate_pagecache_size" member support in ceph_inode_info > struct, this will be used to truncate the pagecache only in kclient > side, because the "i_truncate_size" will always be aligned to BLOCK > SIZE. In fscrypt case we need to use the real size to truncate the > pagecache. > > > Changed in V4: > - Retry the truncate request by 20 times before fail it with -EAGAIN. > - Remove the "fill_last_block" label and move the code to else branch. > - Remove the #3 patch, which has already been sent out separately, in > V3 series. > - Improve some comments in the code. > > Changed in V3: > - Fix possibly corrupting the file just before the MDS acquires the > xlock for FILE lock, another client has updated it. > - Flush the pagecache buffer before reading the last block for the > when filling the truncate request. > - Some other minore fixes. > > > > Jeff Layton (5): > libceph: add CEPH_OSD_OP_ASSERT_VER support > ceph: size handling for encrypted inodes in cap updates > ceph: fscrypt_file field handling in MClientRequest messages > ceph: get file size from fscrypt_file when present in inode traces > ceph: handle fscrypt fields in cap messages from MDS > > Xiubo Li (3): > ceph: add __ceph_get_caps helper support > ceph: add __ceph_sync_read helper support > ceph: add truncate size handling support for fscrypt > > fs/ceph/caps.c | 136 ++++++++++++++---- > fs/ceph/crypto.h | 4 + > fs/ceph/dir.c | 3 + > fs/ceph/file.c | 43 ++++-- > fs/ceph/inode.c | 236 +++++++++++++++++++++++++++++--- > fs/ceph/mds_client.c | 9 +- > fs/ceph/mds_client.h | 2 + > fs/ceph/super.h | 10 ++ > include/linux/ceph/crypto.h | 28 ++++ > include/linux/ceph/osd_client.h | 6 +- > include/linux/ceph/rados.h | 4 + > net/ceph/osd_client.c | 5 + > 12 files changed, 427 insertions(+), 59 deletions(-) > create mode 100644 include/linux/ceph/crypto.h > Thanks Xiubo, This looks like a great start. I set up an environment vs. a cephadm cluster with your fscrypt changes, and started running xfstests against it with test_dummy_encryption enabled. It got to generic/014 and the test hung waiting on a SETATTR call to come back: [root@client1 f3cf8b7a-38ec-11ec-a0e4-52540031ba78.client74208]# cat mdsc 89447 mds0 setattr #1000003b19c Looking at the MDS that it was talking to, I see: Nov 03 08:25:09 cephadm2 ceph-mds[3133]: log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 31.627241 secs Nov 03 08:25:09 cephadm2 ceph-mds[3133]: log_channel(cluster) log [WRN] : slow request 31.627240 seconds old, received at 2021-11-03T12:24:37.911553+0000: client_request(client.74208:89447 setattr size=102498304 #0x1000003b19c 2021-11-03T12:24:37.895292+0000 caller_uid=0, caller_gid=0{0,}) currently acquired locks Nov 03 08:25:14 cephadm2 ceph-mds[3133]: log_channel(cluster) log [WRN] : 1 slow requests, 0 included below; oldest blocked for > 36.627323 secs Nov 03 08:25:19 cephadm2 ceph-mds[3133]: log_channel(cluster) log [WRN] : 1 slow requests, 0 included below; oldest blocked for > 41.627389 secs ...and it still hasn't resolved. I'll keep looking around a bit more, but I think there are still some bugs in here. Let me know if you have thoughts as to what the issue is. Thanks, -- Jeff Layton <jlayton@xxxxxxxxxx>