On 11/18/21 12:46 PM, Xiubo Li wrote:
On 11/18/21 5:10 AM, Jeff Layton wrote:
On Tue, 2021-11-16 at 17:20 +0800, xiubli@xxxxxxxxxx wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>
In case truncating a file to a smaller sizeA, the sizeA will be kept
in truncate_size. And if truncate the file to a bigger sizeB, the
MDS will only increase the truncate_seq, but still using the sizeA as
the truncate_size.
So when filling the inode it will truncate the pagecache by using
truncate_sizeA again, which makes no sense and will trim the inocent
pages.
Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
---
fs/ceph/inode.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 1b4ce453d397..b4f784684e64 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -738,10 +738,11 @@ int ceph_fill_file_size(struct inode *inode,
int issued,
* don't hold those caps, then we need to check whether
* the file is either opened or mmaped
*/
- if ((issued & (CEPH_CAP_FILE_CACHE|
+ if (ci->i_truncate_size != truncate_size &&
+ ((issued & (CEPH_CAP_FILE_CACHE|
CEPH_CAP_FILE_BUFFER)) ||
mapping_mapped(inode->i_mapping) ||
- __ceph_is_file_opened(ci)) {
+ __ceph_is_file_opened(ci))) {
ci->i_truncate_pending++;
queue_trunc = 1;
}
This patch causes xfstest generic/129 to hang at umount time, when
applied on top of the testing branch, and run (w/o fscrypt being
enabled). The call stack looks like this:
[<0>] wb_wait_for_completion+0xc3/0x120
[<0>] __writeback_inodes_sb_nr+0x151/0x190
[<0>] sync_filesystem+0x59/0x100
[<0>] generic_shutdown_super+0x44/0x1d0
[<0>] kill_anon_super+0x1e/0x40
[<0>] ceph_kill_sb+0x5f/0xc0 [ceph]
[<0>] deactivate_locked_super+0x5d/0xd0
[<0>] cleanup_mnt+0x1f4/0x260
[<0>] task_work_run+0x8b/0xc0
[<0>] exit_to_user_mode_prepare+0x267/0x270
[<0>] syscall_exit_to_user_mode+0x16/0x50
[<0>] do_syscall_64+0x48/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
I suspect this is causing dirty data to get stuck in the cache somehow,
but I haven't tracked down the cause in detail.
BTW, could you reproduce this every time ?
I have tried this based the "ceph-client/wip-fscrypt-size" branch by
both enabling and disabling the "test_dummy_encryption" for many
times, all worked well for me.
And I also tried to test this patch based "testing" branch without
fscrypt being enabled for many times, it also worked well for me:
[root@lxbceph1 xfstests]# date; ./check generic/129; date
Thu Nov 18 12:22:25 CST 2021
FSTYP -- ceph
PLATFORM -- Linux/x86_64 lxbceph1 5.15.0+
MKFS_OPTIONS -- 10.72.7.17:40543:/testB
MOUNT_OPTIONS -- -o
name=admin,secret=AQDS3IFhEtxvORAAxn1d4FVN2bRUsc/TZMpQvQ== -o
context=system_u:object_r:root_t:s0 10.72.47.117:40543:/testB
/mnt/kcephfs/testD
generic/129 648s ... 603s
Ran: generic/129
Passed all 1 tests
Thu Nov 18 12:32:33 CST 2021
Have run this for several hours, till now no stuck happens locally:
$ while [ 1 ]; do date; ./check generic/129; date; done
Is it possible that you were still using the old binaries you built ?
Thanks
BRs
-- Xiubo