On 11/8/21 10:41 PM, Xiubo Li wrote:
On 11/8/21 10:26 PM, Jeff Layton wrote:
On Mon, 2021-11-08 at 22:10 +0800, Xiubo Li wrote:
Hi Jeff,
After this fixing, there still has one bug and I am still looking at
it:
[root@lxbceph1 xfstests]# ./check generic/075
FSTYP -- ceph
PLATFORM -- Linux/x86_64 lxbceph1 5.15.0-rc6+
generic/075 [failed, exit status 1] - output mismatch (see
/mnt/kcephfs/xfstests/results//generic/075.out.bad)
--- tests/generic/075.out 2021-11-08 20:54:07.456980801 +0800
+++ /mnt/kcephfs/xfstests/results//generic/075.out.bad 2021-11-08
21:20:49.741906997 +0800
@@ -12,7 +12,4 @@
-----------------------------------------------
fsx.2 : -d -N numops -l filelen -S 0
-----------------------------------------------
-
------------------------------------------------
-fsx.3 : -d -N numops -l filelen -S 0 -x
------------------------------------------------
...
(Run 'diff -u tests/generic/075.out
/mnt/kcephfs/xfstests/results//generic/075.out.bad' to see the
entire diff)
Ran: generic/075
Failures: generic/075
Failed 1 of 1 tests
I checked the result outputs, it seems when truncating the size to a
smaller sizeA and then to a bigger sizeB again, in theory those
contents
between sizeA and sizeB should be zeroed, but it didn't.
The last block updating is correct.
Any idea ?
Yep, that's the one I saw (intermittently) too. I also saw some failures
around generic/029 and generic/030 that may be related. I haven't dug
down as far into the problem as you have though.
The nice thing about fsx is that it gives you a lot of info about what
it does. There is also a way to replay a series of ops too, so you may
want to try to see if you can make a reliable reproducer for this
problem.
I can reproduce this every time.
Are these truncates running concurrently in different tasks?
No, from the "075.2.fsxlog" they are serialized. The truncates
themselves worked well, but there also have some
mapwrite/write/mapread/read between them. From the logs, I am sure
that those none zeroed contents are from mapwrite/write. It seems the
dirty pages are flushed to OSDs just after the truncates.
If so, then
we may need some mechanism to ensure that they are serialized vs. one
another.
The truncate will hold the 'inode->i_rwsem' lock too, so it won't
allow the truncate/read/write to run in parallel. But I am not sure
the mapwrite ?
The logs segments from "075.2.fsxlog":
2317 3728 mapread 0x330312 thru 0x33a6b1 (0xa3a0 bytes)
2318 3730 punch from 0x5ffc53 to 0x608516, (0x88c3 bytes)
2319 3731 write 0x6e9465 thru 0x6f8c04 (0xf7a0 bytes)
2320 3732 mapread 0x49f516 thru 0x49f570 (0x5b bytes)
2321 3733 write 0x72847b thru 0x733f14 (0xba9a bytes)
2322 3736 punch from 0x2a90a0 to 0x2aa68e, (0x15ee bytes)
2323 3739 write 0x644a24 thru 0x64aa30 (0x600d bytes)
2324 3740 trunc from 0x7aa4b0 to 0x9dbef3
2325 3741 mapread 0x5aa6bd thru 0x5b7246 (0xcb8a bytes)
2326 3742 trunc from 0x9dbef3 to 0x718ae4
2327 3743 write 0x3ac9b0 thru 0x3aeee0 (0x2531 bytes)
2328 3744 read 0x6e171c thru 0x6f0fd6 (0xf8bb bytes)
2329 3747 trunc from 0x718ae4 to 0x627ddb
2330 3748 mapread 0xe4e2c thru 0xf0bd5 (0xbdaa bytes)
2331 3752 write 0x71def1 thru 0x71e152 (0x262 bytes)
2332 3753 mapwrite 0x9eb0d8 thru 0x9f4ef7 (0x9e20 bytes)
2333 3754 mapwrite 0x7db56d thru 0x7e1278 (0x5d0c bytes)
2334 3755 punch from 0x9368cb to 0x9437fb, (0xcf30 bytes)
2335 3757 write 0x366827 thru 0x3699ff (0x31d9 bytes)
2336 3761 mapwrite 0x529471 thru 0x52b085 (0x1c15 bytes)
2337 3762 trunc from 0x9f4ef8 to 0x86bfab
2338 3764 write 0x9c85b9 thru 0x9d0bdc (0x8624 bytes)
2339 3765 mapread 0x11b451 thru 0x11fec5 (0x4a75 bytes)
2340 3766 write 0x5938cb thru 0x59e0d0 (0xa806 bytes)
2341 3767 read 0xe3063 thru 0xe8ee7 (0x5e85 bytes)
2342 3768 punch from 0x859f3f to 0x8698ec, (0xf9ad bytes)
2343 3771 punch from 0x86d188 to 0x86eef3, (0x1d6b bytes)
2344 3773 write 0x9f43c9 thru 0x9fffff (0xbc37 bytes)
2345 3774 trunc from 0xa00000 to 0x26d4b9
2346 3777 trunc from 0x26d4b9 to 0x9c695f
2347 3783 trunc from 0x9c695f to 0x9129ed
2348 3784 mapread 0x448402 thru 0x45074d (0x834c bytes)
2349 READ BAD DATA: offset = 0x448402, size = 0x834c, fname = 075.2
2350 OFFSET GOOD BAD RANGE
2351 0x448402 0x0000 0x74ea 0x00000
2352 operation# (mod 256) for the bad data may be 116
2353 0x448403 0x0000 0xea74 0x00001
2354 operation# (mod 256) for the bad data may be 116
On 11/8/21 9:50 PM, xiubli@xxxxxxxxxx wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>
Hi Jeff,
The #1 could be squashed to the previous "ceph: add truncate size
handling support for fscrypt" commit.
The #2 could be squashed to the previous "ceph: fscrypt_file field
handling in MClientRequest messages" commit.
Thanks.
Xiubo Li (2):
ceph: fix possible crash and data corrupt bugs
ceph: there is no need to round up the sizes when new size is 0
fs/ceph/inode.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)