Currently in case of DAX, we are starting a journal txn everytime for IOMAP_WRITE case. This can be optimized away in case of an overwrite (where the blocks were already allocated). This could give a significant performance boost for multi-threaded writes specially random writes. On PPC64 VM with simulated pmem device, ~10x perf improvement could be seen in random writes (overwrite). Also bcoz this optimizes away the spinlock contention during jbd2 slab cache allocation (jbd2_journal_handle) On x86 VM, ~2x perf improvement was observed. Reported-by: Dan Williams <dan.j.williams@xxxxxxxxx> Suggested-by: Jan Kara <jack@xxxxxxx> Signed-off-by: Ritesh Harjani <riteshh@xxxxxxxxxxxxx> --- fs/ext4/inode.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 10dd470876b3..c18009c91e68 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3437,6 +3437,14 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits, EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; + /* + * In case of DAX write, we check if this is overwrite request, to avoid + * starting a journal txn in ext4_iomap_alloc() + */ + if ((flags & IOMAP_WRITE) && IS_DAX(inode) && + ext4_overwrite_io(inode, &map, true)) + goto out_set; + if (flags & IOMAP_WRITE) ret = ext4_iomap_alloc(inode, &map, flags); else @@ -3444,9 +3452,8 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, if (ret < 0) return ret; - +out_set: ext4_set_iomap(inode, iomap, &map, offset, length); - return 0; } -- 2.25.4