On 2023-08-05 15:37, Dave Chinner wrote:
On Fri, Aug 04, 2023 at 06:44:47PM -0700, Corey Hickey wrote:
On 2023-08-04 14:52, Dave Chinner wrote:
On Fri, Aug 04, 2023 at 12:26:22PM -0700, Corey Hickey wrote:
On 2023-08-04 01:07, Dave Chinner wrote:
If you want to force XFS to do stripe width aligned allocation for
large files to match with how MD exposes it's topology to
filesytsems, use the 'swalloc' mount option. The down side is that
you'll hotspot the first disk in the MD array....
If I use 'swalloc' with the autodetected (wrong) swidth, I don't see any
unaligned writes.
If I manually specify the (I think) correct values, I do still get writes
aligned to sunit but not swidth, as before.
Hmmm, it should not be doing that - where is the misalignment
happening in the file? swalloc isn't widely used/tested, so there's
every chance there's something unexpected going on in the code...
I don't know how to tell the file position, but I wrote a one-liner for
blktrace that may help. This should tell the position within the block
device of writes enqueued.
xfs_bmap will tell you the file extent layout (offset to lba relationship).
(`xfs_bmap -vvp <file>` output is prefered if you are going to paste
it into an email.)
Ah, nice; the flags even show the alignment.
Here are the results for a filesystem on a 2-data-disk RAID-5 with 128 KB
chunk size.
$ sudo mkfs.xfs -s size=4096 -d sunit=256,swidth=512 /dev/md5 -f
meta-data=/dev/md5 isize=512 agcount=16, agsize=983008 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1 bigtime=1 inobtcount=1 nrext64=0
data = bsize=4096 blocks=15728128, imaxpct=25
= sunit=32 swidth=64 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=16384, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
$ sudo mount -o noatime,swalloc /dev/md5 /mnt/tmp
$ sudo dd if=/dev/zero of=/mnt/tmp/test.bin iflag=fullblock oflag=direct bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 62.6102 s, 171 MB/s
$ sudo xfs_bmap -vvp /mnt/tmp/test.bin
/mnt/tmp/test.bin:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..7806975]: 512..7807487 0 (512..7807487) 7806976 000000
1: [7806976..15613951]: 7864576..15671551 1 (512..7807487) 7806976 000011
2: [15613952..20971519]: 15728640..21086207 2 (512..5358079) 5357568 000000
FLAG Values:
0100000 Shared extent
0010000 Unwritten preallocated extent
0001000 Doesn't begin on stripe unit
0000100 Doesn't end on stripe unit
0000010 Doesn't begin on stripe width
0000001 Doesn't end on stripe width
One thing to try is to set extent size hints for the directories
these large files are going to be written to. That takes a lot of
the allocation decisions away from the size/shape of the individual
IO and instead does large file offset aligned/sized allocations
which are much more likely to be stripe width aligned. e.g. set a
extent size hint of 16MB, and the first write into a hole will
allocate a 16MB chunk around the write instead of just the size that
covers the write IO.
Can you please give me a documentation pointer for that? I wasn't able
to find the right thing via searching.
[...]
$ man xfs_io
....
extsize [ -R | -D ] [ value ]
[...]
Aha, thanks. That's what I was looking for.
-Corey