Re: [PATCH v7 0/9] ceph: size handling for the fscrypt

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2021-11-05 at 22:22 +0800, xiubli@xxxxxxxxxx wrote:
> From: Xiubo Li <xiubli@xxxxxxxxxx>
> 
> This patch series is based on the "wip-fscrypt-fnames" branch in
> repo https://github.com/ceph/ceph-client.git.
> 
> And I have picked up 5 patches from the "ceph-fscrypt-size-experimental"
> branch in repo
> https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git.
> 
> ====
> 
> This approach is based on the discussion from V1 and V2, which will
> pass the encrypted last block contents to MDS along with the truncate
> request.
> 
> This will send the encrypted last block contents to MDS along with
> the truncate request when truncating to a smaller size and at the
> same time new size does not align to BLOCK SIZE.
> 
> The MDS side patch is raised in PR
> https://github.com/ceph/ceph/pull/43588, which is also based Jeff's
> previous great work in PR https://github.com/ceph/ceph/pull/41284.
> 
> The MDS will use the filer.write_trunc(), which could update and
> truncate the file in one shot, instead of filer.truncate().
> 
> This just assume kclient won't support the inline data feature, which
> will be remove soon, more detail please see:
> https://tracker.ceph.com/issues/52916
> 
> Changed in V7:
> - Fixed the sparse check warnings.
> - Removed the include/linux/ceph/crypto.h header file.
> 
> Changed in V6:
> - Fixed the file hole bug, also have updated the MDS side PR.
> - Add add object version support for sync read in #8.
> 
> 
> Changed in V5:
> - Rebase to "wip-fscrypt-fnames" branch in ceph-client.git repo.
> - Pick up 5 patches from Jeff's "ceph-fscrypt-size-experimental" branch
>   in linux.git repo.
> - Add "i_truncate_pagecache_size" member support in ceph_inode_info
>   struct, this will be used to truncate the pagecache only in kclient
>   side, because the "i_truncate_size" will always be aligned to BLOCK
>   SIZE. In fscrypt case we need to use the real size to truncate the
>   pagecache.
> 
> 
> Changed in V4:
> - Retry the truncate request by 20 times before fail it with -EAGAIN.
> - Remove the "fill_last_block" label and move the code to else branch.
> - Remove the #3 patch, which has already been sent out separately, in
>   V3 series.
> - Improve some comments in the code.
> 
> 
> Changed in V3:
> - Fix possibly corrupting the file just before the MDS acquires the
>   xlock for FILE lock, another client has updated it.
> - Flush the pagecache buffer before reading the last block for the
>   when filling the truncate request.
> - Some other minore fixes.
> 
> 
> 
> Jeff Layton (5):
>   libceph: add CEPH_OSD_OP_ASSERT_VER support
>   ceph: size handling for encrypted inodes in cap updates
>   ceph: fscrypt_file field handling in MClientRequest messages
>   ceph: get file size from fscrypt_file when present in inode traces
>   ceph: handle fscrypt fields in cap messages from MDS
> 
> Xiubo Li (4):
>   ceph: add __ceph_get_caps helper support
>   ceph: add __ceph_sync_read helper support
>   ceph: add object version support for sync read
>   ceph: add truncate size handling support for fscrypt
> 
>  fs/ceph/caps.c                  | 136 ++++++++++++++----
>  fs/ceph/crypto.h                |  25 ++++
>  fs/ceph/dir.c                   |   3 +
>  fs/ceph/file.c                  |  76 ++++++++--
>  fs/ceph/inode.c                 | 244 +++++++++++++++++++++++++++++---
>  fs/ceph/mds_client.c            |   9 +-
>  fs/ceph/mds_client.h            |   2 +
>  fs/ceph/super.h                 |  25 ++++
>  include/linux/ceph/osd_client.h |   6 +-
>  include/linux/ceph/rados.h      |   4 +
>  net/ceph/osd_client.c           |   5 +
>  11 files changed, 475 insertions(+), 60 deletions(-)
> 

Thanks Xiubo.

I hit this today after some more testing (generic/014 again):

[ 1674.146843] libceph: mon0 (2)192.168.1.81:3300 session established
[ 1674.150902] libceph: client54432 fsid 4e286176-3d8b-11ec-bece-52540031ba78
[ 1674.153791] ceph: test_dummy_encryption mode enabled
[ 1719.254308] run fstests generic/014 at 2021-11-05 13:36:26
[ 1727.157974] 
[ 1727.158334] =====================================
[ 1727.159219] WARNING: bad unlock balance detected!
[ 1727.160707] 5.15.0-rc6+ #53 Tainted: G           OE    
[ 1727.162248] -------------------------------------
[ 1727.171918] truncfile/7800 is trying to release lock (&mdsc->snap_rwsem) at:
[ 1727.180836] [<ffffffffc127438e>] __ceph_setattr+0x85e/0x1270 [ceph]
[ 1727.192788] but there are no more locks to release!
[ 1727.203450] 
[ 1727.203450] other info that might help us debug this:
[ 1727.220766] 2 locks held by truncfile/7800:
[ 1727.225548]  #0: ffff888116dd2460 (sb_writers#15){.+.+}-{0:0}, at: do_syscall_64+0x3b/0x90
[ 1727.234851]  #1: ffff8882d8dac3d0 (&sb->s_type->i_mutex_key#20){++++}-{3:3}, at: do_truncate+0xbe/0x140
[ 1727.240027] 
[ 1727.240027] stack backtrace:
[ 1727.247863] CPU: 3 PID: 7800 Comm: truncfile Tainted: G           OE     5.15.0-rc6+ #53
[ 1727.252508] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-6.fc35 04/01/2014
[ 1727.257303] Call Trace:
[ 1727.261503]  dump_stack_lvl+0x57/0x72
[ 1727.265492]  lock_release.cold+0x49/0x4e
[ 1727.269499]  ? __ceph_setattr+0x85e/0x1270 [ceph]
[ 1727.273802]  ? lock_downgrade+0x390/0x390
[ 1727.277913]  ? preempt_count_sub+0x14/0xc0
[ 1727.281883]  ? _raw_spin_unlock+0x29/0x40
[ 1727.285725]  ? __ceph_mark_dirty_caps+0x29f/0x450 [ceph]
[ 1727.289959]  up_read+0x17/0x20
[ 1727.293852]  __ceph_setattr+0x85e/0x1270 [ceph]
[ 1727.297827]  ? ceph_inode_work+0x460/0x460 [ceph]
[ 1727.301765]  ceph_setattr+0x12d/0x1c0 [ceph]
[ 1727.305839]  notify_change+0x4e9/0x720
[ 1727.309762]  ? do_truncate+0xcf/0x140
[ 1727.313504]  do_truncate+0xcf/0x140
[ 1727.317092]  ? file_open_root+0x1e0/0x1e0
[ 1727.321022]  ? lock_release+0x410/0x410
[ 1727.324769]  ? lock_is_held_type+0xfb/0x130
[ 1727.328699]  do_sys_ftruncate+0x306/0x350
[ 1727.332449]  do_syscall_64+0x3b/0x90
[ 1727.336127]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1727.340303] RIP: 0033:0x7f7f38be356b
[ 1727.344445] Code: 77 05 c3 0f 1f 40 00 48 8b 15 09 99 0c 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 4d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 d9 98 0c 00 f7 d8
[ 1727.354258] RSP: 002b:00007ffdee35cd18 EFLAGS: 00000202 ORIG_RAX: 000000000000004d
[ 1727.358964] RAX: ffffffffffffffda RBX: 000000000c8a9d62 RCX: 00007f7f38be356b
[ 1727.363836] RDX: 000000000c8a9d62 RSI: 000000000c8a9d62 RDI: 0000000000000003
[ 1727.368467] RBP: 0000000000000003 R08: 000000000000005a R09: 00007f7f38cada60
[ 1727.373285] R10: 00007f7f38af3700 R11: 0000000000000202 R12: 0000000061856b9d
[ 1727.377870] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 1727.382578] ------------[ cut here ]------------
[ 1727.391761] DEBUG_RWSEMS_WARN_ON(tmp < 0): count = 0xffffffffffffff00, magic = 0xffff88812f062220, owner = 0x1, curr 0xffff88810095b280, list empty
[ 1727.419497] WARNING: CPU: 14 PID: 7800 at kernel/locking/rwsem.c:1297 __up_read+0x404/0x430
[ 1727.432752] Modules linked in: ceph(OE) libceph(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) bridge(E) ip_set(E) stp(E) llc(E) rfkill(E) nf_tables(E) nfnetlink(E) cachefiles(E) fscache(E) netfs(E) sunrpc(E) iTCO_wdt(E) intel_pmc_bxt(E) iTCO_vendor_support(E) intel_rapl_msr(E) lpc_ich(E) joydev(E) i2c_i801(E) i2c_smbus(E) virtio_balloon(E) intel_rapl_common(E) fuse(E) zram(E) ip_tables(E) xfs(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) virtio_gpu(E) virtio_blk(E) ghash_clmulni_intel(E) virtio_dma_buf(E) virtio_console(E) serio_raw(E) virtio_net(E) drm_kms_helper(E) net_failover(E) cec(E) failover(E) drm(E) qemu_fw_cfg(E)
[ 1727.506081] CPU: 1 PID: 7800 Comm: truncfile Tainted: G           OE     5.15.0-rc6+ #53
[ 1727.512691] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-6.fc35 04/01/2014
[ 1727.521863] RIP: 0010:__up_read+0x404/0x430
[ 1727.528011] Code: 48 8b 55 00 4d 89 f0 4c 89 e1 53 48 c7 c6 e0 db 89 b1 48 c7 c7 00 d9 89 b1 65 4c 8b 3c 25 80 fe 01 00 4d 89 f9 e8 a2 4b 08 01 <0f> 0b 5a e9 b4 fd ff ff be 08 00 00 00 4c 89 e7 e8 57 4d 33 00 f0
[ 1727.540864] RSP: 0018:ffff888118a07bb0 EFLAGS: 00010286
[ 1727.556265] RAX: 0000000000000000 RBX: ffffffffb189d840 RCX: 0000000000000000
[ 1727.571003] RDX: 0000000000000001 RSI: ffffffffb1aa6380 RDI: ffffed1023140f6c
[ 1727.580837] RBP: ffff88812f062220 R08: ffffffffb0185284 R09: ffff8884209ad7c7
[ 1727.593908] R10: ffffed1084135af8 R11: 0000000000000001 R12: ffff88812f062220
[ 1727.605431] R13: 1ffff11023140f79 R14: 0000000000000001 R15: ffff88810095b280
[ 1727.615457] FS:  00007f7f38add740(0000) GS:ffff888420640000(0000) knlGS:0000000000000000
[ 1727.623346] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1727.631138] CR2: 00007fa7810e8000 CR3: 000000012124a000 CR4: 00000000003506e0
[ 1727.639503] Call Trace:
[ 1727.649160]  ? _raw_spin_unlock+0x29/0x40
[ 1727.655583]  ? up_write+0x270/0x270
[ 1727.661769]  __ceph_setattr+0x85e/0x1270 [ceph]
[ 1727.670914]  ? ceph_inode_work+0x460/0x460 [ceph]
[ 1727.677397]  ceph_setattr+0x12d/0x1c0 [ceph]
[ 1727.685205]  notify_change+0x4e9/0x720
[ 1727.694598]  ? do_truncate+0xcf/0x140
[ 1727.705737]  do_truncate+0xcf/0x140
[ 1727.712680]  ? file_open_root+0x1e0/0x1e0
[ 1727.720447]  ? lock_release+0x410/0x410
[ 1727.727851]  ? lock_is_held_type+0xfb/0x130
[ 1727.734045]  do_sys_ftruncate+0x306/0x350
[ 1727.740636]  do_syscall_64+0x3b/0x90
[ 1727.748675]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1727.755634] RIP: 0033:0x7f7f38be356b
[ 1727.763575] Code: 77 05 c3 0f 1f 40 00 48 8b 15 09 99 0c 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 4d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 d9 98 0c 00 f7 d8
[ 1727.777868] RSP: 002b:00007ffdee35cd18 EFLAGS: 00000202 ORIG_RAX: 000000000000004d
[ 1727.792610] RAX: ffffffffffffffda RBX: 000000000c8a9d62 RCX: 00007f7f38be356b
[ 1727.807383] RDX: 000000000c8a9d62 RSI: 000000000c8a9d62 RDI: 0000000000000003
[ 1727.821520] RBP: 0000000000000003 R08: 000000000000005a R09: 00007f7f38cada60
[ 1727.829368] R10: 00007f7f38af3700 R11: 0000000000000202 R12: 0000000061856b9d
[ 1727.837356] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 1727.849767] irq event stamp: 549109
[ 1727.863878] hardirqs last  enabled at (549109): [<ffffffffb12f8b54>] _raw_spin_unlock_irq+0x24/0x50
[ 1727.879034] hardirqs last disabled at (549108): [<ffffffffb12f8d04>] _raw_spin_lock_irq+0x54/0x60
[ 1727.897434] softirqs last  enabled at (548984): [<ffffffffb013d097>] __irq_exit_rcu+0x157/0x1b0
[ 1727.913276] softirqs last disabled at (548975): [<ffffffffb013d097>] __irq_exit_rcu+0x157/0x1b0
[ 1727.933182] ---[ end trace a89de5333b156523 ]---



I think this patch should fix it:

[PATCH] SQUASH: ensure we unset lock_snap_rwsem after unlocking it

Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
---
 fs/ceph/inode.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index eebbd0296004..cb0ad0faee45 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2635,8 +2635,10 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
 
 	release &= issued;
 	spin_unlock(&ci->i_ceph_lock);
-	if (lock_snap_rwsem)
+	if (lock_snap_rwsem) {
 		up_read(&mdsc->snap_rwsem);
+		lock_snap_rwsem = false;
+	}
 
 	if (inode_dirty_flags)
 		__mark_inode_dirty(inode, inode_dirty_flags);
-- 
2.33.1





[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux