Re: [Question] About XFS random buffer write performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[add hch and willy to cc]

On Tue, Jul 28, 2020 at 07:34:39PM +0800, Zhengyuan Liu wrote:
> Hi all,
> 
> When doing random buffer write testing I found the bandwidth on EXT4 is much
> better than XFS under the same environment.
> The test case ,test result and test environment is as follows:
> Test case:
> fio --ioengine=sync --rw=randwrite --iodepth=64 --size=4G --name=test
> --filename=/mnt/testfile --bs=4k
> Before doing fio, use dd (if=/dev/zero of=/mnt/testfile bs=1M
> count=4096) to warm-up the file in the page cache.
> 
> Test result (bandwidth):
>          ext4                   xfs
>        ~300MB/s       ~120MB/s
> 
> Test environment:
>     Platform:  arm64
>     Kernel:  v5.7
>     PAGESIZE:  64K
>     Memtotal:  16G
>     Storage: sata ssd(Max bandwidth about 350MB/s)
>     FS block size: 4K
> 
> The  fio "Test result" shows that EXT4 has more than 2x bandwidth compared to
> XFS, but iostat shows the transfer speed of XFS to SSD is about 300MB/s too.
> So I debt XFS writing back many non-dirty blocks to SSD while  writing back
> dirty pages. I tried to read the core writeback code of both
> filesystem and found
> XFS will write back blocks which is uptodate (seeing iomap_writepage_map()),

Ahhh, right, because iomap tracks uptodate separately for each block in
the page, but only tracks dirty status for the whole page.  Hence if you
dirty one byte in the 64k page, xfs will write all 64k even though we
could get away writing 4k like ext4 does.

Hey Christoph & Matthew: If you're already thinking about changing
struct iomap_page, should we add the ability to track per-block dirty
state to reduce the write amplification that Zhengyuan is asking about?

I'm guessing that between willy's THP series, Dave's iomap chunks
series, and whatever Christoph may or may not be writing, at least one
of you might have already implemented this? :)

--D

> while EXT4 writes back blocks which must be dirty (seeing
> ext4_bio_write_page() ) . XFS had turned from buffer head to iomap since
> V4.8, there is only a bitmap in iomap to track block's uptodate
> status, no 'dirty'
> concept was found, my question is if this is the reason why XFS writes many
> extra blocks to SSD when doing random buffer write? If it is, then why don't we
> track the dirty status of blocks in XFS?
> 
> With the questions in brain, I start digging into XFS's history, and found a
> annotations in V2.6.12:
>         /*
>          * Calling this without startio set means we are being asked
> to make a dirty
>          * page ready for freeing it's buffers.  When called with
> startio set then
>          * we are coming from writepage.
>          * When called with startio set it is important that we write the WHOLE
>          * page if possible.
>          * The bh->b_state's cannot know if any of the blocks or which block for
>          * that matter are dirty due to mmap writes, and therefore bh
> uptodate is
>          * only vaild if the page itself isn't completely uptodate.  Some layers
>          * may clear the page dirty flag prior to calling write page, under the
>          * assumption the entire page will be written out; by not
> writing out the
>          * whole page the page can be reused before all valid dirty data is
>          * written out.  Note: in the case of a page that has been dirty'd by
>          * mapwrite and but partially setup by block_prepare_write the
>          * bh->b_states's will not agree and only ones setup by BPW/BCW will
>          * have valid state, thus the whole page must be written out thing.
>          */
>         STATIC int xfs_page_state_convert()
> 
> From above annotations, It seems this has something to do with mmap, but I
> can't get the point , so I turn to you guys to get the help. Anyway, I don't
> think there is such a difference about random write between XFS and EXT4.
> 
> Any reply would be appreciative, Thanks in advance.



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux