On 2024/5/1 16:33, Dave Chinner wrote: > On Wed, May 01, 2024 at 06:11:13PM +1000, Dave Chinner wrote: >> On Wed, Apr 10, 2024 at 10:29:38PM +0800, Zhang Yi wrote: >>> From: Zhang Yi <yi.zhang@xxxxxxxxxx> >>> >>> Implement buffered write iomap path, use ext4_da_map_blocks() to map >>> delalloc extents and add ext4_iomap_get_blocks() to allocate blocks if >>> delalloc is disabled or free space is about to run out. >>> >>> Note that we always allocate unwritten extents for new blocks in the >>> iomap write path, this means that the allocation type is no longer >>> controlled by the dioread_nolock mount option. After that, we could >>> postpone the i_disksize updating to the writeback path, and drop journal >>> handle in the buffered dealloc write path completely. > ..... >>> +/* >>> + * Drop the staled delayed allocation range from the write failure, >>> + * including both start and end blocks. If not, we could leave a range >>> + * of delayed extents covered by a clean folio, it could lead to >>> + * inaccurate space reservation. >>> + */ >>> +static int ext4_iomap_punch_delalloc(struct inode *inode, loff_t offset, >>> + loff_t length) >>> +{ >>> + ext4_es_remove_extent(inode, offset >> inode->i_blkbits, >>> + DIV_ROUND_UP_ULL(length, EXT4_BLOCK_SIZE(inode->i_sb))); >>> return 0; >>> } >>> >>> +static int ext4_iomap_buffered_write_end(struct inode *inode, loff_t offset, >>> + loff_t length, ssize_t written, >>> + unsigned int flags, >>> + struct iomap *iomap) >>> +{ >>> + handle_t *handle; >>> + loff_t end; >>> + int ret = 0, ret2; >>> + >>> + /* delalloc */ >>> + if (iomap->flags & IOMAP_F_EXT4_DELALLOC) { >>> + ret = iomap_file_buffered_write_punch_delalloc(inode, iomap, >>> + offset, length, written, ext4_iomap_punch_delalloc); >>> + if (ret) >>> + ext4_warning(inode->i_sb, >>> + "Failed to clean up delalloc for inode %lu, %d", >>> + inode->i_ino, ret); >>> + return ret; >>> + } >> >> Why are you creating a delalloc extent for the write operation and >> then immediately deleting it from the extent tree once the write >> operation is done? > > Ignore this, I mixed up the ext4_iomap_punch_delalloc() code > directly above with iomap_file_buffered_write_punch_delalloc(). > > In hindsight, iomap_file_buffered_write_punch_delalloc() is poorly > named, as it is handling a short write situation which requires > newly allocated delalloc blocks to be punched. > iomap_file_buffered_write_finish() would probably be a better name > for it.... > >> Also, why do you need IOMAP_F_EXT4_DELALLOC? Isn't a delalloc iomap >> set up with iomap->type = IOMAP_DELALLOC? Why can't that be used? > > But this still stands - the first thing > iomap_file_buffered_write_punch_delalloc() is: > > if (iomap->type != IOMAP_DELALLOC) > return 0; > Thanks for the suggestion, the delalloc and non-delalloc write paths share the same ->iomap_end() now (i.e. ext4_iomap_buffered_write_end()), I use the IOMAP_F_EXT4_DELALLOC to identify the write path. For non-delalloc path, If we have allocated more blocks and copied less, we should truncate extra blocks that newly allocated by ->iomap_begin(). If we use IOMAP_DELALLOC, we can't tell if the blocks are pre-existing or newly allocated, we can't truncate the pre-existing blocks, so I have to introduce IOMAP_F_EXT4_DELALLOC. But if we split the delalloc and non-delalloc handler, we could drop IOMAP_F_EXT4_DELALLOC. I also checked xfs, IIUC, xfs doesn't free the extra blocks beyond EOF in xfs_buffered_write_iomap_end() for non-delalloc case since they will be freed by xfs_free_eofblocks in some other inactive paths, like xfs_release()/xfs_inactive()/..., is that right? Thanks, Yi.