With the preceding two patches, it's now possible to accurately determine the number of delayed allocated clusters to be released when removing a block range from the extent tree and extents status tree in ext4_ext_truncate(), ext4_punch_hole(), and ext4_collapse_range(). Since it's not possible to do this when invalidating pages in ext4_da_page_release_reservation(), remove those operations from it. It's still appropriate to clear the delayed bits for invalidated buffers there. Removal of block ranges in ext4_ext_truncate() and other functions appears redundant with removal at page invalidate time, and removal of a block range as a unit should generally incur less CPU overhead than page by page (block by block) removal. Note: this change does result in a regression for generic/036 when running at least the 4k, bigalloc, and bigalloc_1k test cases using the xfstests-bld test appliance. The test passes, but new kernel error messages of the form "Page cache invalidation failure on direct I/O. Possible data corruption due to collision with buffered I/O!" appear in dmesg. It's likely this patch violates a direct I/O implementation requirement, perhaps making DIO vulnerable to read races with buffered I/O. To be addressed. Signed-off-by: Eric Whitney <enwlinux@xxxxxxxxx> --- fs/ext4/inode.c | 51 +++++++++++++-------------------------------------- 1 file changed, 13 insertions(+), 38 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index a55b4db4a29c..6a903a850d95 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1628,62 +1628,37 @@ void ext4_da_release_space(struct inode *inode, int to_free) dquot_release_reservation_block(inode, EXT4_C2B(sbi, to_free)); } +/* + * This code doesn't work and needs to be replaced. Not deleting delayed + * blocks from the extents status tree here and deferring until blocks + * are deleted from the extent tree causes problems for DIO. One avenue + * to explore is a partial reversion of the code here, altering the + * calls to delete blocks from the extents status tree to calls to + * invalidate those blocks, hopefully avoiding problems with the DIO code. + * They would then be deleted and accounted for when the extent tree is + * modified. + */ static void ext4_da_page_release_reservation(struct page *page, unsigned int offset, unsigned int length) { - int to_release = 0, contiguous_blks = 0; struct buffer_head *head, *bh; unsigned int curr_off = 0; - struct inode *inode = page->mapping->host; - struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); unsigned int stop = offset + length; - int num_clusters; - ext4_fsblk_t lblk; + unsigned int next_off; BUG_ON(stop > PAGE_SIZE || stop < length); head = page_buffers(page); bh = head; do { - unsigned int next_off = curr_off + bh->b_size; - + next_off = curr_off + bh->b_size; if (next_off > stop) break; - - if ((offset <= curr_off) && (buffer_delay(bh))) { - to_release++; - contiguous_blks++; + if ((offset <= curr_off) && (buffer_delay(bh))) clear_buffer_delay(bh); - } else if (contiguous_blks) { - lblk = page->index << - (PAGE_SHIFT - inode->i_blkbits); - lblk += (curr_off >> inode->i_blkbits) - - contiguous_blks; - ext4_es_remove_extent(inode, lblk, contiguous_blks); - contiguous_blks = 0; - } curr_off = next_off; } while ((bh = bh->b_this_page) != head); - - if (contiguous_blks) { - lblk = page->index << (PAGE_SHIFT - inode->i_blkbits); - lblk += (curr_off >> inode->i_blkbits) - contiguous_blks; - ext4_es_remove_extent(inode, lblk, contiguous_blks); - } - - /* If we have released all the blocks belonging to a cluster, then we - * need to release the reserved space for that cluster. */ - num_clusters = EXT4_NUM_B2C(sbi, to_release); - while (num_clusters > 0) { - lblk = (page->index << (PAGE_SHIFT - inode->i_blkbits)) + - ((num_clusters - 1) << sbi->s_cluster_bits); - if (sbi->s_cluster_ratio == 1 || - !ext4_find_delalloc_cluster(inode, lblk)) - ext4_da_release_space(inode, 1); - - num_clusters--; - } } /* -- 2.11.0