On Fri 10-02-12 10:47:16, Wu Fengguang wrote: > On Fri, Feb 10, 2012 at 09:52:18AM +0800, Wu Fengguang wrote: > > On Thu, Feb 09, 2012 at 01:30:27PM -0500, Chris Mason wrote: > > > On Thu, Feb 09, 2012 at 01:06:35PM -0500, Christoph Hellwig wrote: > > > > On Thu, Feb 09, 2012 at 04:02:24PM +0800, Wu Fengguang wrote: > > > > > On Thu, Feb 09, 2012 at 10:27:19AM +1100, Dave Chinner wrote: > > > > > > On Wed, Feb 08, 2012 at 07:01:44PM +0800, Wu Fengguang wrote: > > > > > > > Buffered write(2) is not directly tied to IO, so it's not suitable to > > > > > > > handle plug in generic_file_aio_write(). > > > > > > > > > > > > But generic_sync_write() does issue IO for O_SYNC writes, so unless > > > > > > there is plugging at a lower layer in the writeback code then it > > > > > > appears to me that plugging is still necessary (at least inside the > > > > > > sync branch).... > > > > > > > > > > Good catch! It looks that generic_write_sync() eventually calls into > > > > > vfs_fsync_range() which further calls ->fsync(). We may add plugging > > > > > around it: > > > > > > > > > > > > NAK, please keep the plugging down in the fs, or the libraries used but > > > > not common VFS code. > > > > > > Please, what Christoph said. At least for btrfs plugging here is wrong. > > > > OK, I get the point: the fs knows best when to unplug. Since any > > higher level plug nesting will turn such low level efforts into no-op, > > it's highly undesirable to do it in the high level. > > It's actually wrong to do plugging around vfs_fsync_range(). > > Because these call paths > > write() with O_SYNC > generic_write_sync() > vfs_fsync_range() > ->fsync() > generic_file_fsync() > > fsync() > do_fsync() > vfs_fsync() > vfs_fsync_range() > > pass arbitrary @size arguments, which may be much larger than the > preferable I/O size, or may cross extent/device boundaries. > > generic_file_fsync() starts with a filemap_write_and_wait_range() > call, which already has proper plugging somewhere underneath. Then > followed by metadata writes, which has plugging inside > fsync_buffers_list(). At last, sync_inode_metadata() calls into > ->write_inode() which may or may not care plugging. > > The other fs specific ->fsync() do similar steps, varying in the > metadata and fs specific housekeeping part. > > I'll just drop this code. Shall the fs specific metadata I/O be > plugged accordingly? I'm afraid this is beyond my knowledge base... The filesystems I know (ext?, ocfs2, reiserfs, udf) either don't do any metadata io from ->fsync (it happens from a journalling thread) or the io is random so plugging is not desirable anyway AFAIU (well, mpage_writepages() is clever enough to submit metadata which is interleaved with data in one sequential stream together with the data so metadata that remain are mostly random). Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html