Re: [PATCH 0/2] ceph: misc fixes for the fscrypt truncate size handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 11/8/21 10:26 PM, Jeff Layton wrote:
On Mon, 2021-11-08 at 22:10 +0800, Xiubo Li wrote:
Hi Jeff,

After this fixing, there still has one bug and I am still looking at it:

[root@lxbceph1 xfstests]# ./check generic/075
FSTYP         -- ceph
PLATFORM      -- Linux/x86_64 lxbceph1 5.15.0-rc6+

generic/075     [failed, exit status 1] - output mismatch (see
/mnt/kcephfs/xfstests/results//generic/075.out.bad)
      --- tests/generic/075.out    2021-11-08 20:54:07.456980801 +0800
      +++ /mnt/kcephfs/xfstests/results//generic/075.out.bad 2021-11-08
21:20:49.741906997 +0800
      @@ -12,7 +12,4 @@
       -----------------------------------------------
       fsx.2 : -d -N numops -l filelen -S 0
       -----------------------------------------------
      -
      ------------------------------------------------
      -fsx.3 : -d -N numops -l filelen -S 0 -x
      ------------------------------------------------
      ...
      (Run 'diff -u tests/generic/075.out
/mnt/kcephfs/xfstests/results//generic/075.out.bad'  to see the entire diff)
Ran: generic/075
Failures: generic/075
Failed 1 of 1 tests


I checked the result outputs, it seems when truncating the size to a
smaller sizeA and then to a bigger sizeB again, in theory those contents
between sizeA and sizeB should be zeroed, but it didn't.

The last block updating is correct.

Any idea ?


Yep, that's the one I saw (intermittently) too. I also saw some failures
around generic/029 and generic/030 that may be related. I haven't dug
down as far into the problem as you have though.

The nice thing about fsx is that it gives you a lot of info about what
it does. There is also a way to replay a series of ops too, so you may
want to try to see if you can make a reliable reproducer for this
problem.

I can reproduce this every time.


Are these truncates running concurrently in different tasks?

No, from the "075.2.fsxlog" they are serialized. The truncates themselves worked well, but there also have some mapwrite/write/mapread/read between them. From the logs, I am sure that those none zeroed contents are from mapwrite/write. It seems the dirty pages are flushed to OSDs just after the truncates.


  If so, then
we may need some mechanism to ensure that they are serialized vs. one
another.

The truncate will hold the 'inode->i_rwsem' lock too, so it won't allow the truncate/read/write to run in parallel. But I am not sure the mapwrite ?



On 11/8/21 9:50 PM, xiubli@xxxxxxxxxx wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>

Hi Jeff,

The #1 could be squashed to the previous "ceph: add truncate size handling support for fscrypt" commit.
The #2 could be squashed to the previous "ceph: fscrypt_file field handling in MClientRequest messages" commit.

Thanks.

Xiubo Li (2):
    ceph: fix possible crash and data corrupt bugs
    ceph: there is no need to round up the sizes when new size is 0

   fs/ceph/inode.c | 8 +++++---
   1 file changed, 5 insertions(+), 3 deletions(-)





[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux