Re: [RFC 00/23] Enable block size > page size in XFS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 20, 2023 at 09:57:56PM -0700, Luis Chamberlain wrote:
> On Wed, Sep 20, 2023 at 08:00:12PM -0700, Luis Chamberlain wrote:
> > https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=large-block-linus
> > 
> > I haven't tested yet the second branch I pushed though but it applied without any changes
> > so it should be good (usual famous last words).
> 
> I have run some preliminary tests on that branch as well above using fsx
> with larger LBA formats running them all on the *same* system at the
> same time. Kernel is happy.
> 
> root@linus ~ # uname -r
> 6.6.0-rc2-large-block-linus+
> 
> root@linus ~ # mount | grep mnt
> /dev/nvme17n1 on /mnt-16k type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
> /dev/nvme13n1 on /mnt-32k-16ks type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
> /dev/nvme11n1 on /mnt-64k-16ks type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=64k,noquota)
> /dev/nvme18n1 on /mnt-32k type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
> /dev/nvme14n1 on /mnt-64k-32ks type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=64k,noquota)
> /dev/nvme7n1 on /mnt-64k-512b type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
> /dev/nvme4n1 on /mnt-32k-512 type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
> /dev/nvme3n1 on /mnt-16k-512b type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
> /dev/nvme9n1 on /mnt-64k-4ks type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=64k,noquota)
> /dev/nvme8n1 on /mnt-32k-4ks type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
> /dev/nvme6n1 on /mnt-16k-4ks type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
> /dev/nvme5n1 on /mnt-4k type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
> /dev/nvme1n1 on /mnt-512 type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
> 
> root@linus ~ # ps -ef| grep fsx
> root       45601   45172 44 04:02 pts/3    00:20:26 /var/lib/xfstests/ltp/fsx -q -S 0 -p 1000000 /mnt-16k/foo
> root       46207   45658 39 04:04 pts/5    00:17:18 /var/lib/xfstests/ltp/fsx -q -S 0 -p 1000000 /mnt-32k-16ks/foo
> root       46792   46289 35 04:06 pts/7    00:14:36 /var/lib/xfstests/ltp/fsx -q -S 0 -p 1000000 /mnt-64k-16ks/foo
> root       47293   46899 39 04:08 pts/9    00:15:30 /var/lib/xfstests/ltp/fsx -q -S 0 -p 1000000 /mnt-32k/foo
> root       47921   47338 34 04:10 pts/11   00:12:56 /var/lib/xfstests/ltp/fsx -q -S 0 -p 1000000 /mnt-64k-32ks/foo
> root       48898   48484 32 04:14 pts/13   00:10:56 /var/lib/xfstests/ltp/fsx -q -S 0 -p 1000000 /mnt-64k-512b/foo
> root       49313   48939 35 04:15 pts/15   00:11:38 /var/lib/xfstests/ltp/fsx -q -S 0 -p 1000000 /mnt-32k-512/foo
> root       49729   49429 40 04:17 pts/17   00:12:27 /var/lib/xfstests/ltp/fsx -q -S 0 -p 1000000 /mnt-16k-512b/foo
> root       50085   49794 33 04:18 pts/19   00:09:56 /var/lib/xfstests/ltp/fsx -q -S 0 -p 1000000 /mnt-64k-4ks/foo
> root       50449   50130 36 04:19 pts/21   00:10:28 /var/lib/xfstests/ltp/fsx -q -S 0 -p 1000000 /mnt-32k-4ks/foo
> root       50844   50517 41 04:20 pts/23   00:11:22 /var/lib/xfstests/ltp/fsx -q -S 0 -p 1000000 /mnt-16k-4ks/foo
> root       51135   50893 52 04:21 pts/25   00:13:57 /var/lib/xfstests/ltp/fsx -q -S 0 -p 1000000 /mnt-4k/foo
> root       52061   51193 49 04:25 pts/27   00:11:21 /var/lib/xfstests/ltp/fsx -q -S 0 -p 1000000 /mnt-512/foo
> root       57668   52131  0 04:48 pts/29   00:00:00 grep fsx

So I just pulled this, built it and run generic/091 as the very
first test on this:

# ./run_check.sh --mkfs-opts "-m rmapbt=1 -b size=64k" --run-opts "-s xfs_64k generic/091"
.....
meta-data=/dev/pmem0             isize=512    agcount=4, agsize=32768 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=1
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
data     =                       bsize=65536  blocks=131072, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=65536  ascii-ci=0, ftype=1
log      =internal log           bsize=65536  blocks=2613, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=65536  blocks=0, rtextents=0
....
Running: MOUNT_OPTIONS= ./check -R xunit -b -s xfs_64k generic/091
SECTION       -- xfs_64k
FSTYP         -- xfs (debug)
PLATFORM      -- Linux/x86_64 test3 6.6.0-rc2-large-block-linus-dgc+ #1906 SMP PREEMPT_DYNAMIC Thu Sep 21 15:19:47 AEST 2023
MKFS_OPTIONS  -- -f -m rmapbt=1 -b size=64k /dev/pmem1
MOUNT_OPTIONS -- -o dax=never -o context=system_u:object_r:root_t:s0 /dev/pmem1 /mnt/scratch

generic/091 10s ... [failed, exit status 1]- output mismatch (see /home/dave/src/xfstests-dev/results//xfs_64k/generic/091.out.bad)
    --- tests/generic/091.out   2022-12-21 15:53:25.467044754 +1100
    +++ /home/dave/src/xfstests-dev/results//xfs_64k/generic/091.out.bad        2023-09-21 15:47:48.222559248 +1000
    @@ -1,7 +1,113 @@
     QA output created by 091
     fsx -N 10000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
    -fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
    -fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
    -fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
    -fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
    -fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -W
    ...
    (Run 'diff -u /home/dave/src/xfstests-dev/tests/generic/091.out /home/dave/src/xfstests-dev/results//xfs_64k/generic/091.out.bad'  to see the entire diff)
Failures: generic/091
Failed 1 of 1 tests
Xunit report: /home/dave/src/xfstests-dev/results//xfs_64k/result.xml

SECTION       -- xfs_64k
=========================
Failures: generic/091
Failed 1 of 1 tests


real    0m4.214s
user    0m0.972s
sys     0m3.603s
#

For all these assertions about how none of your testing is finding
bugs in this code, It's taken me *4 seconds* of test runtime to find
the first failure.

And, well, it's the same failure as I reported for the previous
version of this code:

# cat /home/dave/src/xfstests-dev/results//xfs_64k/generic/091.out.bad
/home/dave/src/xfstests-dev/ltp/fsx -N 10000 -l 500000 -r 4096 -t 512 -w 512 -Z -R -W /mnt/test/junk
mapped writes DISABLED    
Seed set to 1
main: filesystem does not support exchange range, disabling!
fallocating to largest ever: 0x79f06
READ BAD DATA: offset = 0x18000, size = 0xf000, fname = /mnt/test/junk
OFFSET      GOOD    BAD     RANGE
0x21000     0x0000  0x9008  0x0
operation# (mod 256) for the bad data may be 144
0x21001     0x0000  0x0810  0x1
operation# (mod 256) for the bad data may be 16
0x21002     0x0000  0x1000  0x2
operation# (mod 256) for the bad data may be 16
0x21005     0x0000  0x8e00  0x3
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x21007     0x0000  0x82ff  0x4
operation# (mod 256) for the bad data may be 255
0x21008     0x0000  0xffff  0x5
operation# (mod 256) for the bad data may be 255
0x21009     0x0000  0xffff  0x6
operation# (mod 256) for the bad data may be 255
0x2100a     0x0000  0xffff  0x7
operation# (mod 256) for the bad data may be 255
0x2100b     0x0000  0xff00  0x8
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x21010     0x0000  0x700b  0x9
operation# (mod 256) for the bad data may be 112
0x21011     0x0000  0x0b10  0xa
operation# (mod 256) for the bad data may be 16
0x21012     0x0000  0x1000  0xb
operation# (mod 256) for the bad data may be 16
0x21014     0x0000  0x038e  0xc
operation# (mod 256) for the bad data may be 3
0x21015     0x0000  0x8e00  0xd
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x21017     0x0000  0x82ff  0xe
operation# (mod 256) for the bad data may be 255
0x21018     0x0000  0xffff  0xf
operation# (mod 256) for the bad data may be 255
LOG DUMP (69 total operations):
1(  1 mod 256): FALLOC   0x6ba10 thru 0x79f06   (0xe4f6 bytes) EXTENDING
2(  2 mod 256): SKIPPED (no operation)
3(  3 mod 256): SKIPPED (no operation)
4(  4 mod 256): TRUNCATE DOWN   from 0x79f06 to 0x51800
5(  5 mod 256): SKIPPED (no operation)
6(  6 mod 256): READ     0x1b000 thru 0x21fff   (0x7000 bytes)
7(  7 mod 256): PUNCH    0x2ce7a thru 0x39b9e   (0xcd25 bytes)
8(  8 mod 256): PUNCH    0x29238 thru 0x29f57   (0xd20 bytes)
9(  9 mod 256): COPY 0x3000 thru 0x9fff (0x7000 bytes) to 0x40400 thru 0x473ff
10( 10 mod 256): READ     0x16000 thru 0x21fff  (0xc000 bytes)
11( 11 mod 256): FALLOC   0x4a42b thru 0x4b8f7  (0x14cc bytes) INTERIOR
12( 12 mod 256): TRUNCATE DOWN  from 0x51800 to 0x15c00 ******WWWW
13( 13 mod 256): SKIPPED (no operation)
14( 14 mod 256): READ     0xb000 thru 0x14fff   (0xa000 bytes)
15( 15 mod 256): SKIPPED (no operation)
16( 16 mod 256): SKIPPED (no operation)
17( 17 mod 256): SKIPPED (no operation)
18( 18 mod 256): READ     0x3000 thru 0x11fff   (0xf000 bytes)
19( 19 mod 256): FALLOC   0x69b94 thru 0x6c922  (0x2d8e bytes) EXTENDING
20( 20 mod 256): SKIPPED (no operation)
21( 21 mod 256): SKIPPED (no operation)
22( 22 mod 256): WRITE    0x23000 thru 0x285ff  (0x5600 bytes)
23( 23 mod 256): SKIPPED (no operation)
24( 24 mod 256): SKIPPED (no operation)
25( 25 mod 256): SKIPPED (no operation)
26( 26 mod 256): ZERO     0x1fba0 thru 0x2c568  (0xc9c9 bytes)  ******ZZZZ
27( 27 mod 256): READ     0x4f000 thru 0x50fff  (0x2000 bytes)
28( 28 mod 256): READ     0x39000 thru 0x3afff  (0x2000 bytes)
29( 29 mod 256): WRITE    0x40200 thru 0x4cdff  (0xcc00 bytes)
30( 30 mod 256): SKIPPED (no operation)
31( 31 mod 256): WRITE    0x47e00 thru 0x547ff  (0xca00 bytes)
32( 32 mod 256): SKIPPED (no operation)
33( 33 mod 256): READ     0x28000 thru 0x29fff  (0x2000 bytes)
34( 34 mod 256): SKIPPED (no operation)
35( 35 mod 256): READ     0x69000 thru 0x6bfff  (0x3000 bytes)
36( 36 mod 256): READ     0x16000 thru 0x20fff  (0xb000 bytes)
37( 37 mod 256): ZERO     0x45150 thru 0x47e9c  (0x2d4d bytes)
38( 38 mod 256): SKIPPED (no operation)
39( 39 mod 256): SKIPPED (no operation)
40( 40 mod 256): COPY 0x10000 thru 0x11fff      (0x2000 bytes) to 0x22a00 thru 0x249ff
41( 41 mod 256): WRITE    0x29000 thru 0x2efff  (0x6000 bytes)
42( 42 mod 256): ZERO     0x59c7 thru 0x13eee   (0xe528 bytes)
43( 43 mod 256): FALLOC   0x1fdbf thru 0x2e694  (0xe8d5 bytes) INTERIOR ******FFFF
44( 44 mod 256): SKIPPED (no operation)
45( 45 mod 256): ZERO     0x740f5 thru 0x7a11f  (0x602b bytes)
46( 46 mod 256): SKIPPED (no operation)
47( 47 mod 256): WRITE    0x14200 thru 0x1e3ff  (0xa200 bytes)
48( 48 mod 256): READ     0x69000 thru 0x6bfff  (0x3000 bytes)
49( 49 mod 256): TRUNCATE DOWN  from 0x6c922 to 0x16a00 ******WWWW
50( 50 mod 256): WRITE    0x15000 thru 0x163ff  (0x1400 bytes)
51( 51 mod 256): PUNCH    0x3b5e thru 0xa2c1    (0x6764 bytes)
52( 52 mod 256): SKIPPED (no operation)
53( 53 mod 256): SKIPPED (no operation)
54( 54 mod 256): WRITE    0x34a00 thru 0x3fdff  (0xb400 bytes) HOLE     ***WWWW
55( 55 mod 256): WRITE    0x38000 thru 0x397ff  (0x1800 bytes)
56( 56 mod 256): PUNCH    0x7922 thru 0x115f0   (0x9ccf bytes)
57( 57 mod 256): SKIPPED (no operation)
58( 58 mod 256): SKIPPED (no operation)
59( 59 mod 256): SKIPPED (no operation)
60( 60 mod 256): FALLOC   0x300a8 thru 0x331d0  (0x3128 bytes) INTERIOR
61( 61 mod 256): ZERO     0x3799c thru 0x39245  (0x18aa bytes)
62( 62 mod 256): ZERO     0x62fc3 thru 0x6b630  (0x866e bytes)
63( 63 mod 256): SKIPPED (no operation)
64( 64 mod 256): ZERO     0x6110a thru 0x61dad  (0xca4 bytes)
65( 65 mod 256): FALLOC   0x1d8ca thru 0x20876  (0x2fac bytes) INTERIOR
66( 66 mod 256): COPY 0x65000 thru 0x68fff      (0x4000 bytes) to 0x22400 thru 0x263ff
67( 67 mod 256): SKIPPED (no operation)
68( 68 mod 256): WRITE    0x36a00 thru 0x415ff  (0xac00 bytes)
69( 69 mod 256): READ     0x18000 thru 0x26fff  (0xf000 bytes)  ***RRRR***
Log of operations saved to "/mnt/test/junk.fsxops"; replay with --replay-ops
Correct content saved for comparison
(maybe hexdump "/mnt/test/junk" vs "/mnt/test/junk.fsxgood")

Guess what? The fsx parameters being used means it is testing things you
aren't. Yes, the '-Z -R -W' mean it is using direct IO for reads and writes,
mmap() is disabled. Other parameters indicate that using 4k aligned reads and
512 byte aligned writes and truncates.

There is a reason there are multiple different fsx tests in fstests;
they all exercise different sets of IO behaviours and alignments,
and they exercise the IO paths differently.

So there's clearly something wrong here - it's likely that the
filesystem IO alignment parameters pulled from the underlying block
device (4k physical, 512 byte logical sector sizes) are improperly
interpreted.  i.e. for a filesystem with a sector size of 4kB,
direct IO with an alignment of 512 bytes should be rejected......

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux