Re: [PATCH] ceph: do not truncate pagecache if truncate size doesn't change

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 11/17/21 11:06 PM, Jeff Layton wrote:
On Wed, 2021-11-17 at 09:21 +0800, Xiubo Li wrote:
On 11/17/21 4:06 AM, Jeff Layton wrote:
On Tue, 2021-11-16 at 17:20 +0800, xiubli@xxxxxxxxxx wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>

In case truncating a file to a smaller sizeA, the sizeA will be kept
in truncate_size. And if truncate the file to a bigger sizeB, the
MDS will only increase the truncate_seq, but still using the sizeA as
the truncate_size.

Do you mean "kept in ci->i_truncate_size" ?
Sorry for confusing. It mainly will be kept in the MDS side's
CInode->inode.truncate_size. And also will be propagated to all the
clients' ci->i_truncate_size member.

The MDS will only change CInode->inode.truncate_size when truncating a
smaller size.


If so, is this really the
correct fix? I'll note this in the sources:

          u32 i_truncate_seq;        /* last truncate to smaller size */
          u64 i_truncate_size;       /*  and the size we last truncated down to */

Maybe the MDS ought not bump the truncate_seq unless it was truncating
to a smaller size? If not, then that comment seems wrong at least.
Yeah, the above comments are inconsistent with what the MDS is doing.

Okay, I missed reading the code, I found in MDS that is introduced by
commit :

       bf39d32d936 mds: bump truncate seq when fscrypt_file changes

With the size handling feature support, I think this commit will make no
sense any more since we will calculate the 'truncating_smaller' by not
only comparing the new_size and old_size, which both are rounded up to
FSCRYPT BLOCK SIZE, will also check the 'req->get_data().length()' if
the new_size and old_size are the same.


So when filling the inode it will truncate the pagecache by using
truncate_sizeA again, which makes no sense and will trim the inocent
pages.

Is there a reproducer for this? It would be nice to put something in
xfstests for it if so.
In xfstests' generic/075 has already testing this, but i didn't see any
issue it reproduce. I just found this strange logs when it's doing
something like:

truncateA 0x10000 --> 0x2000

truncateB 0x2000   --> 0x8000

truncateC 0x8000   --> 0x6000

For the truncateC, the log says:

ceph:  truncate_size 0x2000 -> 0x6000


The problem is that the truncateB will also do the vmtruncate by using
the 0x2000 instead, the vmtruncate will not flush the dirty pages to the
OSD and will just discard them from the pagecaches. Then we may lost
some new updated data in case there has any write before the truncateB
in range [0x2000, 0x8000).


Thanks

BRs

-- Xiubo


I tested this today and was still able to reproduce failures in
generic/029 and generic/075 with test_dummy_encryption enabled.

Hi Jeff,

I tested these two cases many times again today and both worked well for me.

[root@lxbceph1 xfstests]# ./check generic/075
FSTYP         -- ceph
PLATFORM      -- Linux/x86_64 lxbceph1 5.15.0+
MKFS_OPTIONS  -- 10.72.7.17:40543:/testB
MOUNT_OPTIONS -- -o test_dummy_encryption,name=admin,secret=AQDS3IFhEtxvORAAxn1d4FVN2bRUsc/TZMpQvQ== -o context=system_u:object_r:root_t:s0 10.72.7.17:40543:/testB /mnt/kcephfs/testD

generic/075 106s ... 356s
Ran: generic/075
Passed all 1 tests

[root@lxbceph1 xfstests]# ./check generic/029
FSTYP         -- ceph
PLATFORM      -- Linux/x86_64 lxbceph1 5.15.0+
MKFS_OPTIONS  -- 10.72.7.17:40543:/testB
MOUNT_OPTIONS -- -o test_dummy_encryption,name=admin,secret=AQDS3IFhEtxvORAAxn1d4FVN2bRUsc/TZMpQvQ== -o context=system_u:object_r:root_t:s0 10.72.7.17:40543:/testB /mnt/kcephfs/testD

generic/029 4s ... 3s
Ran: generic/029
Passed all 1 tests


On the cluster-side, I'm using a cephadm cluster built using an image
based on your fsize_support branch, rebased onto master (the Oct 7 base
you're using is not good for cephadm).

I have updated this branch last night by rebasing it onto the latest upstream master.

And at the same time I have removed the commit:

        bf39d32d936 mds: bump truncate seq when fscrypt_file changes

On the client side, I'm using the ceph-client/wip-fscrypt-size branch,
along with this patch on top.

This I am also using the same branch from ceph-client repo. Nothing changed in my side.

To be safe I just deleted my local branches and synced from ceph-client repo today and test them again, still the same and worked for me.


Xiubo, could you push branches with the current state of client and
server patches that you're using to test this? Maybe that will help
explain why I can still reproduce these problems and you can't.

The following is my config for 'local.config' in xfstests:

[root@lxbceph1 ~]# cat /mnt/kcephfs/xfstests/local.config
export FSTYP=ceph
export TEST_DEV=10.72.7.17:40543:/testA
export SCRATCH_DEV=10.72.7.17:40543:/testB
export TEST_DIR=/mnt/kcephfs/testC
export SCRATCH_MNT=/mnt/kcephfs/testD
export CEPHFS_MOUNT_OPTIONS="-o test_dummy_encryption,name=admin,secret=AQDS3IFhEtxvORAAxn1d4FVN2bRUsc/TZMpQvQ=="


And I cloned the xfstests from:

# wget http://download.ceph.com/qa/xfstests.tar.gz


BTW, what's the failure for generic/075 ? Is it the same one as before ?

On my side, before it failed just after 3 'trunc' in "075.2.fsxlog":

2317 3728 mapread    0x330312 thru   0x33a6b1        (0xa3a0 bytes)
2318 3730 punch      from 0x5ffc53 to 0x608516, (0x88c3 bytes)
2319 3731 write      0x6e9465 thru   0x6f8c04        (0xf7a0 bytes)
2320 3732 mapread    0x49f516 thru   0x49f570        (0x5b bytes)
2321 3733 write      0x72847b thru   0x733f14        (0xba9a bytes)
2322 3736 punch      from 0x2a90a0 to 0x2aa68e, (0x15ee bytes)
2323 3739 write      0x644a24 thru   0x64aa30        (0x600d bytes)
2324 3740 trunc      from 0x7aa4b0 to 0x9dbef3
2325 3741 mapread    0x5aa6bd thru   0x5b7246        (0xcb8a bytes)
2326 3742 trunc      from 0x9dbef3 to 0x718ae4
2327 3743 write      0x3ac9b0 thru   0x3aeee0        (0x2531 bytes)
2328 3744 read       0x6e171c thru   0x6f0fd6        (0xf8bb bytes)
2329 3747 trunc      from 0x718ae4 to 0x627ddb
2330 3748 mapread    0xe4e2c thru    0xf0bd5 (0xbdaa bytes)
2331 3752 write      0x71def1 thru   0x71e152        (0x262 bytes)
2332 3753 mapwrite   0x9eb0d8 thru   0x9f4ef7        (0x9e20 bytes)
2333 3754 mapwrite   0x7db56d thru   0x7e1278        (0x5d0c bytes)
2334 3755 punch      from 0x9368cb to 0x9437fb, (0xcf30 bytes)
2335 3757 write      0x366827 thru   0x3699ff        (0x31d9 bytes)
2336 3761 mapwrite   0x529471 thru   0x52b085        (0x1c15 bytes)
2337 3762 trunc      from 0x9f4ef8 to 0x86bfab
2338 3764 write      0x9c85b9 thru   0x9d0bdc        (0x8624 bytes)
2339 3765 mapread    0x11b451 thru   0x11fec5        (0x4a75 bytes)
2340 3766 write      0x5938cb thru   0x59e0d0        (0xa806 bytes)
2341 3767 read       0xe3063 thru    0xe8ee7 (0x5e85 bytes)
2342 3768 punch      from 0x859f3f to 0x8698ec, (0xf9ad bytes)
2343 3771 punch      from 0x86d188 to 0x86eef3, (0x1d6b bytes)
2344 3773 write      0x9f43c9 thru   0x9fffff        (0xbc37 bytes)
2345 3774 trunc      from 0xa00000 to 0x26d4b9
2346 3777 trunc      from 0x26d4b9 to 0x9c695f
2347 3783 trunc      from 0x9c695f to 0x9129ed
2348 3784 mapread    0x448402 thru   0x45074d        (0x834c bytes)
2349 READ BAD DATA: offset = 0x448402, size = 0x834c, fname = 075.2
2350 OFFSET  GOOD    BAD     RANGE
2351 0x448402        0x0000  0x74ea  0x00000
2352 operation# (mod 256) for the bad data may be 116
2353 0x448403        0x0000  0xea74  0x00001
2354 operation# (mod 256) for the bad data may be 116


Thanks

-- Xiubo


Thanks,
Jeff

Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
---
   fs/ceph/inode.c | 5 +++--
   1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 1b4ce453d397..b4f784684e64 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -738,10 +738,11 @@ int ceph_fill_file_size(struct inode *inode, int issued,
   			 * don't hold those caps, then we need to check whether
   			 * the file is either opened or mmaped
   			 */
-			if ((issued & (CEPH_CAP_FILE_CACHE|
+			if (ci->i_truncate_size != truncate_size &&
+			    ((issued & (CEPH_CAP_FILE_CACHE|
   				       CEPH_CAP_FILE_BUFFER)) ||
   			    mapping_mapped(inode->i_mapping) ||
-			    __ceph_is_file_opened(ci)) {
+			    __ceph_is_file_opened(ci))) {
   				ci->i_truncate_pending++;
   				queue_trunc = 1;
   			}





[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux