Re: [patch 7/9] mm: write_cache_pages terminate quickly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 30, 2008 at 04:07:46PM -0700, Andrew Morton wrote:
> On Wed, 29 Oct 2008 01:47:22 +1100
> npiggin@xxxxxxx wrote:
> 
> > Terminate the write_cache_pages loop upon encountering the first page past
> > end, without locking the page. Pages cannot have their index change when we
> > have a reference on them (truncate, eg truncate_inode_pages_range performs
> > the same check without the page lock).
> > 
> 
> Traditionally lock_page() is used to stabilise ->index and ->mapping. 

Well, mapping. index of course is irrelevant without mapping, *except*
for a "where did we get to" kind of thing. But it has been used in that
way for a long time.


> Here you introduce a new and very subtle sort-of-locking rule without
> actually really introducing it at all.  OK, there's a little comment
> buried way down in this function.  But there's a contradictory comment
> over truncate_inode_pages_range() ("When looking at...").

That comment is actually wrong. Index won't change. If index could change
randomly, then we could skip pages here if index skips forwards. pagevec
pagecache tag lookup functions would be broken in general actually.

 
> How do we make this new locking rule maintainable?  How do we avoid
> breaking it in the future?  How do we prevent accidental breakage from
> slipping past developers' and reviewers' attention?

It's actually fairly fundamental. Even more fundamental than the above
functions I quote.

If we have any place that does:
lock_page(page)
if (!page->mapping) /* truncate got to it */

but does not check the index of the page (which most don't), then it could
have moved from where we first got it from (which would not always be a
bug, but often could be).

read(2) syscall actually also doesn't lock the page by default. Having the
page move somewhere else would be a disaster for it.

I guess it's not explicitly documented AFAIKS, but I thought it is a
hard rule. Is there anywhere useful we can write it that people will
actually read?

OTOH, there isn't a lot of places that could be doing this. Some wild
filesystem might think they own the pagecache I guess. I know that when
it came up in splice, I told Jens we can't move a page with references
on it even if it is locked...

 
> Given the additional maintenance burdens, is this change worth doing
> at all?
> 
> 
> > ---
> > Index: linux-2.6/mm/page-writeback.c
> > ===================================================================
> > --- linux-2.6.orig/mm/page-writeback.c
> > +++ linux-2.6/mm/page-writeback.c
> > @@ -911,15 +911,24 @@ retry:
> >  		for (i = 0; i < nr_pages; i++) {
> >  			struct page *page = pvec.pages[i];
> >  
> > -			done_index = page->index + 1;
> > -
> >  			/*
> > -			 * At this point we hold neither mapping->tree_lock nor
> > -			 * lock on the page itself: the page may be truncated or
> > -			 * invalidated (changing page->mapping to NULL), or even
> > -			 * swizzled back from swapper_space to tmpfs file
> > -			 * mapping
> > +			 * At this point, the page may be truncated or
> > +			 * invalidated (changing page->mapping to NULL), or
> > +			 * even swizzled back from swapper_space to tmpfs file
> > +			 * mapping. However, page->index will not change
> > +			 * because we have a reference on the page.
> >  			 */
> > +			if (page->index > end) {
> > +				/*
> > +				 * can't be range_cyclic (1st pass) because
> > +				 * end == -1 in that case.
> > +				 */
> > +				done = 1;
> > +				break;
> > +			}
> > +
> > +			done_index = page->index + 1;
> > +
> >  			lock_page(page);
> >  
> >  			/*
> > @@ -936,15 +945,6 @@ continue_unlock:
> >  				continue;
> >  			}
> >  
> > -			if (page->index > end) {
> > -				/*
> > -				 * can't be range_cyclic (1st pass) because
> > -				 * end == -1 in that case.
> > -				 */
> > -				done = 1;
> > -				goto continue_unlock;
> > -			}
> > -
> >  			if (!PageDirty(page)) {
> >  				/* someone wrote it for us */
> >  				goto continue_unlock;
> > 
> > -- 
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux