On Fri, 2008-10-31 at 10:16 +0100, Nick Piggin wrote: > On Thu, Oct 30, 2008 at 04:13:44PM -0700, Andrew Morton wrote: > > On Wed, 29 Oct 2008 01:47:24 +1100 > > npiggin@xxxxxxx wrote: > > > > > Chris Mason notices do_sync_mapping_range didn't actually ask for > data > > > integrity writeout. Unfortunately, it is advertised as being > usable for > > > data integrity operations. > > > > > > This is a data interity bug. > > > [ use WB_SYNC_ALL instead of WB_SYNC_NONE ] > > If the caller > > is using sync_file_range() for integrity then the caller has done a > > SYNC_FILE_RANGE_WAIT_BEFORE. > > No disputes about whether the API works "by design". But I think the > implementation has a bug. I'll explain: I'll definitely agree the current usage is clumsy, and there is a bug in fs/buffer.c. A grep through the rest of the filesystems doesn't turn up many assumptions that WB_SYNC_NONE means it's ok to skip dirty pages. Greps for WB_SYNC_ALL and WB_SYNC_NONE in the fs code reveal: fs/buffer.c:__block_write_full_page() if (wbc->sync_mode != WB_SYNC_NONE || !wbc->nonblocking) { lock_buffer(bh); } else if (!trylock_buffer(bh)) { redirty_page_for_writepage(wbc, page); continue; } Easily fixed s/||/&&/, which is what XFS does. reiser3 has the same bug in fs/reiserfs/inode.c ntfs and gfs2 each have a check that assumes WB_SYNC_NONE means optional writeback, both seem fixable with one liners. Everywhere we wait on page writeback while we're trying to build nice big bios hurts performance. I'd rather see us switch to something closer to the do_sync_mapping_range expectation of WB_SYNC_NONE than sprinkle WB_SYNC_ALLs everywhere. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html