Re: [patch 2/2] fs: fix page_mkwrite error cases in core code and btrfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 12, 2009 at 04:03:57PM -0700, Sage Weil wrote:
> On Thu, 12 Mar 2009, Trond Myklebust wrote:
> > On Wed, 2009-03-11 at 04:55 +0100, Nick Piggin wrote:
> > > page_mkwrite is called with neither the page lock nor the ptl held. This
> > > means a page can be concurrently truncated or invalidated out from underneath
> > > it. Callers are supposed to prevent truncate races themselves, however
> > > previously the only thing they can do in case they hit one is to raise a
> > > SIGBUS. A sigbus is wrong for the case that the page has been invalidated
> > > or truncated within i_size (eg. hole punched). Callers may also have to
> > > perform memory allocations in this path, where again, SIGBUS would be wrong.
> > > 
> > > The previous patch made it possible to properly specify errors. Convert
> > > the generic buffer.c code and btrfs to return sane error values
> > > (in the case of page removed from pagecache, VM_FAULT_NOPAGE will cause the
> > > fault handler to exit without doing anything, and the fault will be retried 
> > > properly).
> > > 
> > > This fixes core code, and converts btrfs as a template/example. All other
> > > filesystems defining their own page_mkwrite should be fixed in a similar
> > > manner.
> > 
> > There appears to be another atomicity problem in the same area of
> > code...
> > 
> > The lack of locking between the call to ->page_mkwrite() and the
> > subsequent call to set_page_dirty_balance() means that the filesystem
> > may actually already have written out the page by the time you get round
> > to calling set_page_dirty_balance().
> 
> We were just banging our heads against this issue last week.

That's coming too:
http://marc.info/?l=linux-fsdevel&m=123555461816471&w=2

(we ended up deciding to call with page unlocked and return with locked,
as it solves locking problems in some filesystems).

I'll resend that patch soonish. Hopefully it will work for you two?


> Among other things, if ->set_page_dirty sets up anything in page->private, 
> you can get an ->invalidatepage on a non-dirty page (which confused the 
> hell out of me until I realized do_wp_page() was calling set_page_dirty 
> too).
> 
> > How then is the filesystem supposed to guarantee that whatever structure
> > it allocated in page_mkwrite() is still around when the page gets marked
> > as dirty a second time?
> 
> Can page_mkwrite() be made responsible for marking the page dirty, instead 
> of doing it from do_wp_page()?  That would allow the fs to do the dirtying 
> under the protection of the page lock, or whatever other internal locking 
> scheme it has.  That's how the regular write path works, and it would be 
> nice to be able to just call write_{begin,end} from ->page_mkwrite() (as 
> at least ext4 does) without being followed by a second racy call to 
> ->set_page_dirty()...

No because the VM also needs to cover races where the page is dirtied
after the pte is set made writable.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux