Re: [patch 9/9] mm: do_sync_mapping_range integrity fix

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 31, 2008 at 10:10:19AM -0400, Chris Mason wrote:
> On Fri, 2008-10-31 at 10:16 +0100, Nick Piggin wrote:
> > On Thu, Oct 30, 2008 at 04:13:44PM -0700, Andrew Morton wrote:
> > > On Wed, 29 Oct 2008 01:47:24 +1100
> > > npiggin@xxxxxxx wrote:
> > > 
> > > > Chris Mason notices do_sync_mapping_range didn't actually ask for
> > data
> > > > integrity writeout. Unfortunately, it is advertised as being
> > usable for
> > > > data integrity operations.
> > > > 
> > > > This is a data interity bug.
> > > > 
> 
> [ use WB_SYNC_ALL instead of WB_SYNC_NONE ]
> 
> > >  If the caller
> > > is using sync_file_range() for integrity then the caller has done a
> > > SYNC_FILE_RANGE_WAIT_BEFORE.
> > 
> > No disputes about whether the API works "by design". But I think the
> > implementation has a bug. I'll explain:
> 
> I'll definitely agree the current usage is clumsy, and there is a bug in
> fs/buffer.c.  A grep through the rest of the filesystems doesn't turn up
> many assumptions that WB_SYNC_NONE means it's ok to skip dirty pages.
> 
> Greps for WB_SYNC_ALL and WB_SYNC_NONE in the fs code reveal:
> 
> fs/buffer.c:__block_write_full_page()
>                if (wbc->sync_mode != WB_SYNC_NONE || !wbc->nonblocking) {
>                         lock_buffer(bh);
>                 } else if (!trylock_buffer(bh)) {
>                         redirty_page_for_writepage(wbc, page);
>                         continue;
>                 }
> 
> Easily fixed s/||/&&/, which is what XFS does.  reiser3 has the same bug
> in fs/reiserfs/inode.c
> 
> ntfs and gfs2 each have a check that assumes WB_SYNC_NONE means optional
> writeback, both seem fixable with one liners.

Well, we went over this before :) Semantics... Whether or not it can be
considered a bug, WB_SYNC_NONE is very much considered not to be a data
integrity sync in code and comments (eg. buffer.c fs-writeback.c and
file systems) for a long time.

So that's not something I can really change (especially not for stable
releases).

It's _probably_ a good idea not to fragment bios, and it's probably not
going to introduce bugs if we convert everything over. But presently,
all the bugs/misunderstandings/whatever mean that WB_SYNC_NONE is not
usable for data integrity. Obviously this has been the case for long
before sync_mapping_range was added.


> Everywhere we wait on page writeback while we're trying to build nice
> big bios hurts performance.  I'd rather see us switch to something
> closer to the do_sync_mapping_range expectation of WB_SYNC_NONE than
> sprinkle WB_SYNC_ALLs everywhere.

They already are I guess. At least, this is the only place I have seen
that needed changing. After that series, I think we get a lot closer to
being correct in a lot of the writeback paths, which is a good starting
point to make improvements. I'll be looking at the "livelock" avoidance,
but you or other fs developers probably in a better position to look at
things like waiting for locked or writeback pages...

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux