On Fri, Feb 10, 2012 at 09:52:18AM +0800, Wu Fengguang wrote: > On Thu, Feb 09, 2012 at 01:30:27PM -0500, Chris Mason wrote: > > On Thu, Feb 09, 2012 at 01:06:35PM -0500, Christoph Hellwig wrote: > > > On Thu, Feb 09, 2012 at 04:02:24PM +0800, Wu Fengguang wrote: > > > > On Thu, Feb 09, 2012 at 10:27:19AM +1100, Dave Chinner wrote: > > > > > On Wed, Feb 08, 2012 at 07:01:44PM +0800, Wu Fengguang wrote: > > > > > > Buffered write(2) is not directly tied to IO, so it's not suitable to > > > > > > handle plug in generic_file_aio_write(). > > > > > > > > > > But generic_sync_write() does issue IO for O_SYNC writes, so unless > > > > > there is plugging at a lower layer in the writeback code then it > > > > > appears to me that plugging is still necessary (at least inside the > > > > > sync branch).... > > > > > > > > Good catch! It looks that generic_write_sync() eventually calls into > > > > vfs_fsync_range() which further calls ->fsync(). We may add plugging > > > > around it: > > > > > > > > > NAK, please keep the plugging down in the fs, or the libraries used but > > > not common VFS code. > > > > Please, what Christoph said. At least for btrfs plugging here is wrong. > > OK, I get the point: the fs knows best when to unplug. Since any > higher level plug nesting will turn such low level efforts into no-op, > it's highly undesirable to do it in the high level. It's actually wrong to do plugging around vfs_fsync_range(). Because these call paths write() with O_SYNC generic_write_sync() vfs_fsync_range() ->fsync() generic_file_fsync() fsync() do_fsync() vfs_fsync() vfs_fsync_range() pass arbitrary @size arguments, which may be much larger than the preferable I/O size, or may cross extent/device boundaries. generic_file_fsync() starts with a filemap_write_and_wait_range() call, which already has proper plugging somewhere underneath. Then followed by metadata writes, which has plugging inside fsync_buffers_list(). At last, sync_inode_metadata() calls into ->write_inode() which may or may not care plugging. The other fs specific ->fsync() do similar steps, varying in the metadata and fs specific housekeeping part. I'll just drop this code. Shall the fs specific metadata I/O be plugged accordingly? I'm afraid this is beyond my knowledge base... Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html