Jan Kara <jack@xxxxxxx> writes: > On Sun 16-04-23 15:38:37, Ritesh Harjani (IBM) wrote: >> Some of the higher layers like iomap takes inode_lock() when calling >> generic_write_sync(). >> Also writeback already happens from other paths without inode lock, >> so it's difficult to say that we really need sync_mapping_buffers() to >> take any inode locking here. Having said that, let's add >> generic_buffer_fsync() implementation in buffer.c with no >> inode_lock/unlock() for now so that filesystems like ext2 and >> ext4's nojournal mode can use it. >> >> Ext4 when got converted to iomap for direct-io already copied it's own >> variant of __generic_file_fsync() without lock. Hence let's add a helper >> API and use it both in ext2 and ext4. >> >> Later we can review other filesystems as well to see if we can make >> generic_buffer_fsync() which does not take any inode_lock() as the >> default path. >> >> Tested-by: Disha Goel <disgoel@xxxxxxxxxxxxx> >> Reviewed-by: Christoph Hellwig <hch@xxxxxx> >> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx> > > There is a problem with generic_buffer_fsync() that it does not call > blkdev_issue_flush() so the caller is responsible for doing that. That's > necessary for ext2 & ext4 so fine for now. But historically this was the > case with generic_file_fsync() as well and that led to many filesystem > forgetting to flush caches from fsync(2). Ok, thanks for the details. > What is our transition plan for > these filesystems that currently do the cache flush from > generic_file_fsync()? Do we want to eventually keep generic_file_fsync() > doing the cache flush and call generic_buffer_fsync() instead of > __generic_buffer_fsync() from it? Frankly speaking, I was thinking we will come back to this question maybe when we start working on those changes. At this point in time I only looked at it from ext2 DIO changes perspective. But since you asked, here is what I think we could do - Rename generic_file_fsync => generic_buffers_sync() to fs/buffers.c Then generic_buffers_sync() { ret = generic_buffers_fsync() if (!ret) blkdev_issue_flush() } generic_buffers_fsync() is same as in this patch which does not have the cache flush operation. (will rename from generic_buffer_fsync() to generic_buffers_fsync()) Note: The naming is kept such that- - sync means it will do fsync followed by cache flush. - fsync means it will only do the file fsync As I understand - we would eventually like to kill the inode_lock() variants of generic_file_fsync() and __generic_file_fsync() after auditing other filesystem code, right? Then for now what we need is generic_buffers_sync() function which does not take an inode_lock() and also does cache flush which is required for ext2. And generic_buffers_fsync() which does not do any cache flush operations required by filesystem like ext4. Does that sound good to you? Is the naming also proper? Is yes, then I can rename the below function to generic_buffers_fsync() and also create implementation of generic_buffers_sync(). Then let ext2 and ext4 use them. -ritesh > > Honza > >> --- >> fs/buffer.c | 43 +++++++++++++++++++++++++++++++++++++ >> include/linux/buffer_head.h | 2 ++ >> 2 files changed, 45 insertions(+) >> >> diff --git a/fs/buffer.c b/fs/buffer.c >> index 9e1e2add541e..df98f1966a71 100644 >> --- a/fs/buffer.c >> +++ b/fs/buffer.c >> @@ -593,6 +593,49 @@ int sync_mapping_buffers(struct address_space *mapping) >> } >> EXPORT_SYMBOL(sync_mapping_buffers); >> >> +/** >> + * generic_buffer_fsync - generic buffer fsync implementation >> + * for simple filesystems with no inode lock >> + * >> + * @file: file to synchronize >> + * @start: start offset in bytes >> + * @end: end offset in bytes (inclusive) >> + * @datasync: only synchronize essential metadata if true >> + * >> + * This is a generic implementation of the fsync method for simple >> + * filesystems which track all non-inode metadata in the buffers list >> + * hanging off the address_space structure. >> + */ >> +int generic_buffer_fsync(struct file *file, loff_t start, loff_t end, >> + bool datasync) >> +{ >> + struct inode *inode = file->f_mapping->host; >> + int err; >> + int ret; >> + >> + err = file_write_and_wait_range(file, start, end); >> + if (err) >> + return err; >> + >> + ret = sync_mapping_buffers(inode->i_mapping); >> + if (!(inode->i_state & I_DIRTY_ALL)) >> + goto out; >> + if (datasync && !(inode->i_state & I_DIRTY_DATASYNC)) >> + goto out; >> + >> + err = sync_inode_metadata(inode, 1); >> + if (ret == 0) >> + ret = err; >> + >> +out: >> + /* check and advance again to catch errors after syncing out buffers */ >> + err = file_check_and_advance_wb_err(file); >> + if (ret == 0) >> + ret = err; >> + return ret; >> +} >> +EXPORT_SYMBOL(generic_buffer_fsync); >> + >> /* >> * Called when we've recently written block `bblock', and it is known that >> * `bblock' was for a buffer_boundary() buffer. This means that the block at >> diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h >> index 8f14dca5fed7..3170d0792d52 100644 >> --- a/include/linux/buffer_head.h >> +++ b/include/linux/buffer_head.h >> @@ -211,6 +211,8 @@ int inode_has_buffers(struct inode *); >> void invalidate_inode_buffers(struct inode *); >> int remove_inode_buffers(struct inode *inode); >> int sync_mapping_buffers(struct address_space *mapping); >> +int generic_buffer_fsync(struct file *file, loff_t start, loff_t end, >> + bool datasync); >> void clean_bdev_aliases(struct block_device *bdev, sector_t block, >> sector_t len); >> static inline void clean_bdev_bh_alias(struct buffer_head *bh) >> -- >> 2.39.2 >> > -- > Jan Kara <jack@xxxxxxxx> > SUSE Labs, CR