On Fri, 2021-11-05 at 14:36 -0400, Jeff Layton wrote: > On Fri, 2021-11-05 at 22:22 +0800, xiubli@xxxxxxxxxx wrote: > > From: Xiubo Li <xiubli@xxxxxxxxxx> > > > > This patch series is based on the "wip-fscrypt-fnames" branch in > > repo https://github.com/ceph/ceph-client.git. > > > > And I have picked up 5 patches from the "ceph-fscrypt-size-experimental" > > branch in repo > > https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git. > > > > ==== > > > > This approach is based on the discussion from V1 and V2, which will > > pass the encrypted last block contents to MDS along with the truncate > > request. > > > > This will send the encrypted last block contents to MDS along with > > the truncate request when truncating to a smaller size and at the > > same time new size does not align to BLOCK SIZE. > > > > The MDS side patch is raised in PR > > https://github.com/ceph/ceph/pull/43588, which is also based Jeff's > > previous great work in PR https://github.com/ceph/ceph/pull/41284. > > > > The MDS will use the filer.write_trunc(), which could update and > > truncate the file in one shot, instead of filer.truncate(). > > > > This just assume kclient won't support the inline data feature, which > > will be remove soon, more detail please see: > > https://tracker.ceph.com/issues/52916 > > > > Changed in V7: > > - Fixed the sparse check warnings. > > - Removed the include/linux/ceph/crypto.h header file. > > > > Changed in V6: > > - Fixed the file hole bug, also have updated the MDS side PR. > > - Add add object version support for sync read in #8. > > > > > > Changed in V5: > > - Rebase to "wip-fscrypt-fnames" branch in ceph-client.git repo. > > - Pick up 5 patches from Jeff's "ceph-fscrypt-size-experimental" branch > > in linux.git repo. > > - Add "i_truncate_pagecache_size" member support in ceph_inode_info > > struct, this will be used to truncate the pagecache only in kclient > > side, because the "i_truncate_size" will always be aligned to BLOCK > > SIZE. In fscrypt case we need to use the real size to truncate the > > pagecache. > > > > > > Changed in V4: > > - Retry the truncate request by 20 times before fail it with -EAGAIN. > > - Remove the "fill_last_block" label and move the code to else branch. > > - Remove the #3 patch, which has already been sent out separately, in > > V3 series. > > - Improve some comments in the code. > > > > > > Changed in V3: > > - Fix possibly corrupting the file just before the MDS acquires the > > xlock for FILE lock, another client has updated it. > > - Flush the pagecache buffer before reading the last block for the > > when filling the truncate request. > > - Some other minore fixes. > > > > > > > > Jeff Layton (5): > > libceph: add CEPH_OSD_OP_ASSERT_VER support > > ceph: size handling for encrypted inodes in cap updates > > ceph: fscrypt_file field handling in MClientRequest messages > > ceph: get file size from fscrypt_file when present in inode traces > > ceph: handle fscrypt fields in cap messages from MDS > > > > Xiubo Li (4): > > ceph: add __ceph_get_caps helper support > > ceph: add __ceph_sync_read helper support > > ceph: add object version support for sync read > > ceph: add truncate size handling support for fscrypt > > > > fs/ceph/caps.c | 136 ++++++++++++++---- > > fs/ceph/crypto.h | 25 ++++ > > fs/ceph/dir.c | 3 + > > fs/ceph/file.c | 76 ++++++++-- > > fs/ceph/inode.c | 244 +++++++++++++++++++++++++++++--- > > fs/ceph/mds_client.c | 9 +- > > fs/ceph/mds_client.h | 2 + > > fs/ceph/super.h | 25 ++++ > > include/linux/ceph/osd_client.h | 6 +- > > include/linux/ceph/rados.h | 4 + > > net/ceph/osd_client.c | 5 + > > 11 files changed, 475 insertions(+), 60 deletions(-) > > > > Thanks Xiubo. > > I hit this today after some more testing (generic/014 again): > > [ 1674.146843] libceph: mon0 (2)192.168.1.81:3300 session established > [ 1674.150902] libceph: client54432 fsid 4e286176-3d8b-11ec-bece-52540031ba78 > [ 1674.153791] ceph: test_dummy_encryption mode enabled > [ 1719.254308] run fstests generic/014 at 2021-11-05 13:36:26 > [ 1727.157974] > [ 1727.158334] ===================================== > [ 1727.159219] WARNING: bad unlock balance detected! > [ 1727.160707] 5.15.0-rc6+ #53 Tainted: G OE > [ 1727.162248] ------------------------------------- > [ 1727.171918] truncfile/7800 is trying to release lock (&mdsc->snap_rwsem) at: > [ 1727.180836] [<ffffffffc127438e>] __ceph_setattr+0x85e/0x1270 [ceph] > [ 1727.192788] but there are no more locks to release! > [ 1727.203450] > [ 1727.203450] other info that might help us debug this: > [ 1727.220766] 2 locks held by truncfile/7800: > [ 1727.225548] #0: ffff888116dd2460 (sb_writers#15){.+.+}-{0:0}, at: do_syscall_64+0x3b/0x90 > [ 1727.234851] #1: ffff8882d8dac3d0 (&sb->s_type->i_mutex_key#20){++++}-{3:3}, at: do_truncate+0xbe/0x140 > [ 1727.240027] > [ 1727.240027] stack backtrace: > [ 1727.247863] CPU: 3 PID: 7800 Comm: truncfile Tainted: G OE 5.15.0-rc6+ #53 > [ 1727.252508] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-6.fc35 04/01/2014 > [ 1727.257303] Call Trace: > [ 1727.261503] dump_stack_lvl+0x57/0x72 > [ 1727.265492] lock_release.cold+0x49/0x4e > [ 1727.269499] ? __ceph_setattr+0x85e/0x1270 [ceph] > [ 1727.273802] ? lock_downgrade+0x390/0x390 > [ 1727.277913] ? preempt_count_sub+0x14/0xc0 > [ 1727.281883] ? _raw_spin_unlock+0x29/0x40 > [ 1727.285725] ? __ceph_mark_dirty_caps+0x29f/0x450 [ceph] > [ 1727.289959] up_read+0x17/0x20 > [ 1727.293852] __ceph_setattr+0x85e/0x1270 [ceph] > [ 1727.297827] ? ceph_inode_work+0x460/0x460 [ceph] > [ 1727.301765] ceph_setattr+0x12d/0x1c0 [ceph] > [ 1727.305839] notify_change+0x4e9/0x720 > [ 1727.309762] ? do_truncate+0xcf/0x140 > [ 1727.313504] do_truncate+0xcf/0x140 > [ 1727.317092] ? file_open_root+0x1e0/0x1e0 > [ 1727.321022] ? lock_release+0x410/0x410 > [ 1727.324769] ? lock_is_held_type+0xfb/0x130 > [ 1727.328699] do_sys_ftruncate+0x306/0x350 > [ 1727.332449] do_syscall_64+0x3b/0x90 > [ 1727.336127] entry_SYSCALL_64_after_hwframe+0x44/0xae > [ 1727.340303] RIP: 0033:0x7f7f38be356b > [ 1727.344445] Code: 77 05 c3 0f 1f 40 00 48 8b 15 09 99 0c 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 4d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 d9 98 0c 00 f7 d8 > [ 1727.354258] RSP: 002b:00007ffdee35cd18 EFLAGS: 00000202 ORIG_RAX: 000000000000004d > [ 1727.358964] RAX: ffffffffffffffda RBX: 000000000c8a9d62 RCX: 00007f7f38be356b > [ 1727.363836] RDX: 000000000c8a9d62 RSI: 000000000c8a9d62 RDI: 0000000000000003 > [ 1727.368467] RBP: 0000000000000003 R08: 000000000000005a R09: 00007f7f38cada60 > [ 1727.373285] R10: 00007f7f38af3700 R11: 0000000000000202 R12: 0000000061856b9d > [ 1727.377870] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > [ 1727.382578] ------------[ cut here ]------------ > [ 1727.391761] DEBUG_RWSEMS_WARN_ON(tmp < 0): count = 0xffffffffffffff00, magic = 0xffff88812f062220, owner = 0x1, curr 0xffff88810095b280, list empty > [ 1727.419497] WARNING: CPU: 14 PID: 7800 at kernel/locking/rwsem.c:1297 __up_read+0x404/0x430 > [ 1727.432752] Modules linked in: ceph(OE) libceph(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) bridge(E) ip_set(E) stp(E) llc(E) rfkill(E) nf_tables(E) nfnetlink(E) cachefiles(E) fscache(E) netfs(E) sunrpc(E) iTCO_wdt(E) intel_pmc_bxt(E) iTCO_vendor_support(E) intel_rapl_msr(E) lpc_ich(E) joydev(E) i2c_i801(E) i2c_smbus(E) virtio_balloon(E) intel_rapl_common(E) fuse(E) zram(E) ip_tables(E) xfs(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) virtio_gpu(E) virtio_blk(E) ghash_clmulni_intel(E) virtio_dma_buf(E) virtio_console(E) serio_raw(E) virtio_net(E) drm_kms_helper(E) net_failover(E) cec(E) failover(E) drm(E) qemu_fw_cfg(E) > [ 1727.506081] CPU: 1 PID: 7800 Comm: truncfile Tainted: G OE 5.15.0-rc6+ #53 > [ 1727.512691] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-6.fc35 04/01/2014 > [ 1727.521863] RIP: 0010:__up_read+0x404/0x430 > [ 1727.528011] Code: 48 8b 55 00 4d 89 f0 4c 89 e1 53 48 c7 c6 e0 db 89 b1 48 c7 c7 00 d9 89 b1 65 4c 8b 3c 25 80 fe 01 00 4d 89 f9 e8 a2 4b 08 01 <0f> 0b 5a e9 b4 fd ff ff be 08 00 00 00 4c 89 e7 e8 57 4d 33 00 f0 > [ 1727.540864] RSP: 0018:ffff888118a07bb0 EFLAGS: 00010286 > [ 1727.556265] RAX: 0000000000000000 RBX: ffffffffb189d840 RCX: 0000000000000000 > [ 1727.571003] RDX: 0000000000000001 RSI: ffffffffb1aa6380 RDI: ffffed1023140f6c > [ 1727.580837] RBP: ffff88812f062220 R08: ffffffffb0185284 R09: ffff8884209ad7c7 > [ 1727.593908] R10: ffffed1084135af8 R11: 0000000000000001 R12: ffff88812f062220 > [ 1727.605431] R13: 1ffff11023140f79 R14: 0000000000000001 R15: ffff88810095b280 > [ 1727.615457] FS: 00007f7f38add740(0000) GS:ffff888420640000(0000) knlGS:0000000000000000 > [ 1727.623346] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1727.631138] CR2: 00007fa7810e8000 CR3: 000000012124a000 CR4: 00000000003506e0 > [ 1727.639503] Call Trace: > [ 1727.649160] ? _raw_spin_unlock+0x29/0x40 > [ 1727.655583] ? up_write+0x270/0x270 > [ 1727.661769] __ceph_setattr+0x85e/0x1270 [ceph] > [ 1727.670914] ? ceph_inode_work+0x460/0x460 [ceph] > [ 1727.677397] ceph_setattr+0x12d/0x1c0 [ceph] > [ 1727.685205] notify_change+0x4e9/0x720 > [ 1727.694598] ? do_truncate+0xcf/0x140 > [ 1727.705737] do_truncate+0xcf/0x140 > [ 1727.712680] ? file_open_root+0x1e0/0x1e0 > [ 1727.720447] ? lock_release+0x410/0x410 > [ 1727.727851] ? lock_is_held_type+0xfb/0x130 > [ 1727.734045] do_sys_ftruncate+0x306/0x350 > [ 1727.740636] do_syscall_64+0x3b/0x90 > [ 1727.748675] entry_SYSCALL_64_after_hwframe+0x44/0xae > [ 1727.755634] RIP: 0033:0x7f7f38be356b > [ 1727.763575] Code: 77 05 c3 0f 1f 40 00 48 8b 15 09 99 0c 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 4d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 d9 98 0c 00 f7 d8 > [ 1727.777868] RSP: 002b:00007ffdee35cd18 EFLAGS: 00000202 ORIG_RAX: 000000000000004d > [ 1727.792610] RAX: ffffffffffffffda RBX: 000000000c8a9d62 RCX: 00007f7f38be356b > [ 1727.807383] RDX: 000000000c8a9d62 RSI: 000000000c8a9d62 RDI: 0000000000000003 > [ 1727.821520] RBP: 0000000000000003 R08: 000000000000005a R09: 00007f7f38cada60 > [ 1727.829368] R10: 00007f7f38af3700 R11: 0000000000000202 R12: 0000000061856b9d > [ 1727.837356] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > [ 1727.849767] irq event stamp: 549109 > [ 1727.863878] hardirqs last enabled at (549109): [<ffffffffb12f8b54>] _raw_spin_unlock_irq+0x24/0x50 > [ 1727.879034] hardirqs last disabled at (549108): [<ffffffffb12f8d04>] _raw_spin_lock_irq+0x54/0x60 > [ 1727.897434] softirqs last enabled at (548984): [<ffffffffb013d097>] __irq_exit_rcu+0x157/0x1b0 > [ 1727.913276] softirqs last disabled at (548975): [<ffffffffb013d097>] __irq_exit_rcu+0x157/0x1b0 > [ 1727.933182] ---[ end trace a89de5333b156523 ]--- > > > > I think this patch should fix it: > > [PATCH] SQUASH: ensure we unset lock_snap_rwsem after unlocking it > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > --- > fs/ceph/inode.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c > index eebbd0296004..cb0ad0faee45 100644 > --- a/fs/ceph/inode.c > +++ b/fs/ceph/inode.c > @@ -2635,8 +2635,10 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c > > release &= issued; > spin_unlock(&ci->i_ceph_lock); > - if (lock_snap_rwsem) > + if (lock_snap_rwsem) { > up_read(&mdsc->snap_rwsem); > + lock_snap_rwsem = false; > + } > > if (inode_dirty_flags) > __mark_inode_dirty(inode, inode_dirty_flags); Testing with that patch on top of your latest series looks pretty good so far. I see some xfstests failures that need to be investigated (generic/075, in particular). I'll take a harder look at that next week. For now, I've gone ahead and updated wip-fscrypt-fnames to the latest fnames branch, and also pushed a new wip-fscrypt-size branch that has all of your patches, with the above SQUASH patch folded into #9. I'll continue the testing next week, but I think the -size branch is probably a good place to work from for now. Thanks! -- Jeff Layton <jlayton@xxxxxxxxxx>