Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx> writes: > Christoph Hellwig <hch@xxxxxxxxxxxxx> writes: > >> On Wed, Nov 22, 2023 at 01:29:46PM +0100, Jan Kara wrote: >>> writeback bit set. XFS plays the revalidation sequence counter games >>> because of this so we'd have to do something similar for ext2. Not that I'd >>> care as much about ext2 writeback performance but it should not be that >>> hard and we'll definitely need some similar solution for ext4 anyway. Can >>> you give that a try (as a followup "performance improvement" patch). >> >> Darrick has mentioned that he is looking into lifting more of the >> validation sequence counter validation into iomap. >> >> In the meantime I have a series here that at least maps multiple blocks >> inside a folio in a single go, which might be worth trying with ext2 as >> well: >> >> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/iomap-map-multiple-blocks > > Sure, thanks for providing details. I will check this. > So I took a look at this. Here is what I think - So this is useful of-course when we have a large folio. Because otherwise it's just one block at a time for each folio. This is not a problem once FS buffered-io handling code moves to iomap (because we can then enable large folio support to it). However, this would still require us to pass a folio to ->map_blocks call to determine the size of the folio (which I am not saying can't be done but just stating my observations here). Now coming to implementing validation seq check. I am hoping, it should be straight forward at least for ext2, because it mostly just have to protect against truncate operation (which can change the block mapping)... ...ok so here is the PoC for seq counter check for ext2. Next let me try to see if we can lift this up from the FS side to iomap - >From 86b7bdf79107c3d0a4f37583121d7c136db1bc5c Mon Sep 17 00:00:00 2001 From: "Ritesh Harjani (IBM)" <ritesh.list@xxxxxxxxx> Subject: [PATCH] ext2: Implement seq check for mapping multiblocks in ->map_blocks This implements inode block seq counter (ib_seq) which helps in validating whether the cached wpc (struct iomap_writepage_ctx) still has the valid entries or not. Block mapping can get changed say due to a racing truncate. Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx> --- fs/ext2/balloc.c | 1 + fs/ext2/ext2.h | 6 ++++++ fs/ext2/inode.c | 51 +++++++++++++++++++++++++++++++++++++++++++----- fs/ext2/super.c | 2 +- 4 files changed, 54 insertions(+), 6 deletions(-) diff --git a/fs/ext2/balloc.c b/fs/ext2/balloc.c index e124f3d709b2..e97040194da4 100644 --- a/fs/ext2/balloc.c +++ b/fs/ext2/balloc.c @@ -495,6 +495,7 @@ void ext2_free_blocks(struct inode * inode, ext2_fsblk_t block, } ext2_debug ("freeing block(s) %lu-%lu\n", block, block + count - 1); + ext2_inc_ib_seq(EXT2_I(inode)); do_more: overflow = 0; diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h index 677a9ad45dcb..882c14d20183 100644 --- a/fs/ext2/ext2.h +++ b/fs/ext2/ext2.h @@ -663,6 +663,7 @@ struct ext2_inode_info { struct rw_semaphore xattr_sem; #endif rwlock_t i_meta_lock; + unsigned int ib_seq; /* * truncate_mutex is for serialising ext2_truncate() against @@ -698,6 +699,11 @@ static inline struct ext2_inode_info *EXT2_I(struct inode *inode) return container_of(inode, struct ext2_inode_info, vfs_inode); } +static inline void ext2_inc_ib_seq(struct ext2_inode_info *ei) +{ + WRITE_ONCE(ei->ib_seq, READ_ONCE(ei->ib_seq) + 1); +} + /* balloc.c */ extern int ext2_bg_has_super(struct super_block *sb, int group); extern unsigned long ext2_bg_num_gdb(struct super_block *sb, int group); diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index f4e0b9a93095..a5490d641c26 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -406,6 +406,8 @@ static int ext2_alloc_blocks(struct inode *inode, ext2_fsblk_t current_block = 0; int ret = 0; + ext2_inc_ib_seq(EXT2_I(inode)); + /* * Here we try to allocate the requested multiple blocks at once, * on a best-effort basis. @@ -966,14 +968,53 @@ ext2_writepages(struct address_space *mapping, struct writeback_control *wbc) return mpage_writepages(mapping, wbc, ext2_get_block); } +struct ext2_writepage_ctx { + struct iomap_writepage_ctx ctx; + unsigned int ib_seq; +}; + +static inline +struct ext2_writepage_ctx *EXT2_WPC(struct iomap_writepage_ctx *ctx) +{ + return container_of(ctx, struct ext2_writepage_ctx, ctx); +} + +static bool ext2_imap_valid(struct iomap_writepage_ctx *wpc, struct inode *inode, + loff_t offset) +{ + if (offset < wpc->iomap.offset || + offset >= wpc->iomap.offset + wpc->iomap.length) + return false; + + if (EXT2_WPC(wpc)->ib_seq != READ_ONCE(EXT2_I(inode)->ib_seq)) + return false; + + return true; +} + static int ext2_write_map_blocks(struct iomap_writepage_ctx *wpc, struct inode *inode, loff_t offset) { - if (offset >= wpc->iomap.offset && - offset < wpc->iomap.offset + wpc->iomap.length) + loff_t maxblocks = (loff_t)INT_MAX; + u8 blkbits = inode->i_blkbits; + u32 bno; + bool new, boundary; + int ret; + + if (ext2_imap_valid(wpc, inode, offset)) return 0; - return ext2_iomap_begin(inode, offset, inode->i_sb->s_blocksize, + EXT2_WPC(wpc)->ib_seq = READ_ONCE(EXT2_I(inode)->ib_seq); + + ret = ext2_get_blocks(inode, offset >> blkbits, maxblocks << blkbits, + &bno, &new, &boundary, 0); + if (ret < 0) + return ret; + /* + * ret can be 0 in case of a hole which is possible for mmaped writes. + */ + ret = ret ? ret : 1; + return ext2_iomap_begin(inode, offset, (loff_t)ret << blkbits, IOMAP_WRITE, &wpc->iomap, NULL); } @@ -984,9 +1025,9 @@ static const struct iomap_writeback_ops ext2_writeback_ops = { static int ext2_file_writepages(struct address_space *mapping, struct writeback_control *wbc) { - struct iomap_writepage_ctx wpc = { }; + struct ext2_writepage_ctx wpc = { }; - return iomap_writepages(mapping, wbc, &wpc, &ext2_writeback_ops); + return iomap_writepages(mapping, wbc, &wpc.ctx, &ext2_writeback_ops); } static int diff --git a/fs/ext2/super.c b/fs/ext2/super.c index 645ee6142f69..cd1d1678e35b 100644 --- a/fs/ext2/super.c +++ b/fs/ext2/super.c @@ -188,7 +188,7 @@ static struct inode *ext2_alloc_inode(struct super_block *sb) #ifdef CONFIG_QUOTA memset(&ei->i_dquot, 0, sizeof(ei->i_dquot)); #endif - + WRITE_ONCE(ei->ib_seq, 0); return &ei->vfs_inode; } 2.30.2