On Fri, Oct 31, 2008 at 10:10:19AM -0400, Chris Mason wrote: > On Fri, 2008-10-31 at 10:16 +0100, Nick Piggin wrote: > > On Thu, Oct 30, 2008 at 04:13:44PM -0700, Andrew Morton wrote: > > > On Wed, 29 Oct 2008 01:47:24 +1100 > > > npiggin@xxxxxxx wrote: > > > > > > > Chris Mason notices do_sync_mapping_range didn't actually ask for > > data > > > > integrity writeout. Unfortunately, it is advertised as being > > usable for > > > > data integrity operations. > > > > > > > > This is a data interity bug. > > > > > > [ use WB_SYNC_ALL instead of WB_SYNC_NONE ] > > > > If the caller > > > is using sync_file_range() for integrity then the caller has done a > > > SYNC_FILE_RANGE_WAIT_BEFORE. > > > > No disputes about whether the API works "by design". But I think the > > implementation has a bug. I'll explain: > > I'll definitely agree the current usage is clumsy, and there is a bug in > fs/buffer.c. A grep through the rest of the filesystems doesn't turn up > many assumptions that WB_SYNC_NONE means it's ok to skip dirty pages. > > Greps for WB_SYNC_ALL and WB_SYNC_NONE in the fs code reveal: > > fs/buffer.c:__block_write_full_page() > if (wbc->sync_mode != WB_SYNC_NONE || !wbc->nonblocking) { > lock_buffer(bh); > } else if (!trylock_buffer(bh)) { > redirty_page_for_writepage(wbc, page); > continue; > } > > Easily fixed s/||/&&/, which is what XFS does. reiser3 has the same bug > in fs/reiserfs/inode.c > > ntfs and gfs2 each have a check that assumes WB_SYNC_NONE means optional > writeback, both seem fixable with one liners. Well, we went over this before :) Semantics... Whether or not it can be considered a bug, WB_SYNC_NONE is very much considered not to be a data integrity sync in code and comments (eg. buffer.c fs-writeback.c and file systems) for a long time. So that's not something I can really change (especially not for stable releases). It's _probably_ a good idea not to fragment bios, and it's probably not going to introduce bugs if we convert everything over. But presently, all the bugs/misunderstandings/whatever mean that WB_SYNC_NONE is not usable for data integrity. Obviously this has been the case for long before sync_mapping_range was added. > Everywhere we wait on page writeback while we're trying to build nice > big bios hurts performance. I'd rather see us switch to something > closer to the do_sync_mapping_range expectation of WB_SYNC_NONE than > sprinkle WB_SYNC_ALLs everywhere. They already are I guess. At least, this is the only place I have seen that needed changing. After that series, I think we get a lot closer to being correct in a lot of the writeback paths, which is a good starting point to make improvements. I'll be looking at the "livelock" avoidance, but you or other fs developers probably in a better position to look at things like waiting for locked or writeback pages... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html