Re: [PATCHv10 8/8] iomap: Add per-block dirty state tracking to improve performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Andreas Gruenbacher <agruenba@xxxxxxxxxx> writes:

> On Mon, Jun 19, 2023 at 4:29 AM Ritesh Harjani (IBM)
> <ritesh.list@xxxxxxxxx> wrote:
>> When filesystem blocksize is less than folio size (either with
>> mapping_large_folio_support() or with blocksize < pagesize) and when the
>> folio is uptodate in pagecache, then even a byte write can cause
>> an entire folio to be written to disk during writeback. This happens
>> because we currently don't have a mechanism to track per-block dirty
>> state within struct iomap_folio_state. We currently only track uptodate
>> state.
>>
>> This patch implements support for tracking per-block dirty state in
>> iomap_folio_state->state bitmap. This should help improve the filesystem
>> write performance and help reduce write amplification.
>>
>> Performance testing of below fio workload reveals ~16x performance
>> improvement using nvme with XFS (4k blocksize) on Power (64K pagesize)
>> FIO reported write bw scores improved from around ~28 MBps to ~452 MBps.
>>
>> 1. <test_randwrite.fio>
>> [global]
>>         ioengine=psync
>>         rw=randwrite
>>         overwrite=1
>>         pre_read=1
>>         direct=0
>>         bs=4k
>>         size=1G
>>         dir=./
>>         numjobs=8
>>         fdatasync=1
>>         runtime=60
>>         iodepth=64
>>         group_reporting=1
>>
>> [fio-run]
>>
>> 2. Also our internal performance team reported that this patch improves
>>    their database workload performance by around ~83% (with XFS on Power)
>>
>> Reported-by: Aravinda Herle <araherle@xxxxxxxxxx>
>> Reported-by: Brian Foster <bfoster@xxxxxxxxxx>
>> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx>
>> ---
>>  fs/gfs2/aops.c         |   2 +-
>>  fs/iomap/buffered-io.c | 189 ++++++++++++++++++++++++++++++++++++-----
>>  fs/xfs/xfs_aops.c      |   2 +-
>>  fs/zonefs/file.c       |   2 +-
>>  include/linux/iomap.h  |   1 +
>>  5 files changed, 171 insertions(+), 25 deletions(-)
>>
>> diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
>> index a5f4be6b9213..75efec3c3b71 100644
>> --- a/fs/gfs2/aops.c
>> +++ b/fs/gfs2/aops.c
>> @@ -746,7 +746,7 @@ static const struct address_space_operations gfs2_aops = {
>>         .writepages = gfs2_writepages,
>>         .read_folio = gfs2_read_folio,
>>         .readahead = gfs2_readahead,
>> -       .dirty_folio = filemap_dirty_folio,
>> +       .dirty_folio = iomap_dirty_folio,
>>         .release_folio = iomap_release_folio,
>>         .invalidate_folio = iomap_invalidate_folio,
>>         .bmap = gfs2_bmap,
>> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>> index 391d918ddd22..50f5840bb5f9 100644
>> --- a/fs/iomap/buffered-io.c
>> +++ b/fs/iomap/buffered-io.c
>> @@ -25,7 +25,7 @@
>>
>>  typedef int (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length);
>>  /*
>> - * Structure allocated for each folio to track per-block uptodate state
>> + * Structure allocated for each folio to track per-block uptodate, dirty state
>>   * and I/O completions.
>>   */
>>  struct iomap_folio_state {
>> @@ -35,31 +35,55 @@ struct iomap_folio_state {
>>         unsigned long           state[];
>>  };
>>
>> +enum iomap_block_state {
>> +       IOMAP_ST_UPTODATE,
>> +       IOMAP_ST_DIRTY,
>> +
>> +       IOMAP_ST_MAX,
>> +};
>> +
>> +static void ifs_calc_range(struct folio *folio, size_t off, size_t len,
>> +               enum iomap_block_state state, unsigned int *first_blkp,
>> +               unsigned int *nr_blksp)
>> +{
>> +       struct inode *inode = folio->mapping->host;
>> +       unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
>> +       unsigned int first = off >> inode->i_blkbits;
>> +       unsigned int last = (off + len - 1) >> inode->i_blkbits;
>> +
>> +       *first_blkp = first + (state * blks_per_folio);
>> +       *nr_blksp = last - first + 1;
>> +}
>> +
>>  static struct bio_set iomap_ioend_bioset;
>>
>>  static inline bool ifs_is_fully_uptodate(struct folio *folio,
>>                                                struct iomap_folio_state *ifs)
>>  {
>>         struct inode *inode = folio->mapping->host;
>> +       unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
>> +       unsigned int nr_blks = (IOMAP_ST_UPTODATE + 1) * blks_per_folio;
>
> This nr_blks calculation doesn't make sense.
>

About this, I have replied with more details here [1]

[1]: https://lore.kernel.org/linux-xfs/87o7lbmnam.fsf@xxxxxxx/


>> -       return bitmap_full(ifs->state, i_blocks_per_folio(inode, folio));
>> +       return bitmap_full(ifs->state, nr_blks);
>
> Could you please change this to:
>
> BUILD_BUG_ON(IOMAP_ST_UPTODATE != 0);

ditto

> return bitmap_full(ifs->state, blks_per_folio);
>
> Also, I'm seeing that the value of i_blocks_per_folio() is assigned to
> local variables with various names in several places (blks_per_folio,
> nr_blocks, nblocks). Maybe this could be made consistent.
>

I remember giving it a try, but then it was really not worth it because
all of above naming also does make sense in the way they are getting
used currently in the code. So, I think as long as it is clear behind
what those variables means and how are they getting used, it should be
ok, IMO. 

-ritesh




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux