Re: Kernel config option which causes reiser4 to be instable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07 January 2013 01:06:36 Edward Shishkin wrote:
> On 12/29/2012 07:47 PM, Ivan Shapovalov wrote:
> > On 29 December 2012 01:24:48 Edward Shishkin wrote:
> >> On 12/26/2012 05:22 PM, Ivan Shapovalov wrote:
> >>> Hello Edward,
> >> 
> >> Hi Ivan,
> >> 
> >>> I've apparently managed to get a working implementation of
> >>> ->migratepage().
> >>> 
> >>> I'm attaching a patch; it seems stable, but I'm worried somewhat because
> >>> the destination pages (parameter newpage) sometimes (pretty often) have
> >>> a
> >>> non-NULL private field.
> >>> 
> >>> Currently I've put there a warning and a set_page_private(newpage, 0) -
> >>> it
> >>> seems to work, but well...
> >>> I'm continuing to test it and will report if that actually has bad
> >>> consequences.
> >>> 
> >>> - Ivan
> >>> 
> >>>>>   Edward.
> >>>>>   
> >>>>>>   >   Thanks,
> >>>>>>   >   Ivan.
> >>>>>>   >   
> >>>>>>>   >>   Also before the release I'll try to take a look at this:
> >>>>>>>   >>   http://marc.info/?l=reiserfs-devel&m=135402207623711&w=2
> >>>>>>>   >>   
> >>>>>>>   >>   This failed path might indicate that we adjusted to
> >>>>>>>   >>   fs-writeback
> >>>>>>>   >>   incorrectly.
> >>>>>>>   >>   
> >>>>>>>   >>   Edward.
> >>>>>>>   >>   
> >>>>>>>>   >>>   Regards,
> >>>>>>>>   >>>   Ivan.
> >>>>>>>>   >>>   
> >>>>>>>>>>>>   >>>>>>>         on kernel
> >>>>>>>>>>>>   >>>>>>>   
> >>>>>>>>>>>>   >>>>>>>   3.6.10, and everything seems to be OK so far (so
> >>>>>>>>>>>>   >>>>>>>   the
> >>>>>>>>>>>>   >>>>>>>   workaround is
> >>>>>>>>>>>>   >>>>>>>   version-
> >>>>>>>>>>>>   >>>>>>>   agnostic).
> >>>>>>>>>>>>   >>>>>>>   
> >>>>>>>>>>>>   >>>>>>>   Edward, are there any guesses on what can make
> >>>>>>>>>>>>   >>>>>>>   reiser4
> >>>>>>>>>>>>   >>>>>>>   choke on
> >>>>>>>>>>>>   >>>>>>>   hugepages/compaction/migration?
> >>>>>>>>>>>   >>>>>>   
> >>>>>>>>>>>   >>>>>>   TBH, no ideas. They (hugepages) are_transparent_.
> >>>>>>>>>>>   >>>>>>   It means we shouldn't suffer in theory;)
> >>>>>>>>>>>   >>>>>>   
> >>>>>>>>>>>>   >>>>>>>         I'm not even barely familiar with the kernel
> >>>>>>>>>>>>   >>>>>>>   
> >>>>>>>>>>>>   >>>>>>>   internals.
> >>>>>>>>>>>>   >>>>>>>   
> >>>>>>>>>>>>   >>>>>>>   Thanks,
> >>>>>>>>>>>>   >>>>>>>   Ivan.
> >>> 
> >>> reiser4-migratepage.patch
> >>> 
> >>> 
> >>> diff --git a/fs/reiser4/as_ops.c b/fs/reiser4/as_ops.c
> >>> index 8d8a37d..20006fc 100644
> >>> --- a/fs/reiser4/as_ops.c
> >>> +++ b/fs/reiser4/as_ops.c
> >>> @@ -43,6 +43,7 @@
> >>> 
> >>>    #include<linux/backing-dev.h>
> >>>    #include<linux/quotaops.h>
> >>>    #include<linux/security.h>
> >>> 
> >>> +#include<linux/migrate.h>
> >>> 
> >>>    /* address space operations */
> >>> 
> >>> @@ -304,6 +305,98 @@ int reiser4_releasepage(struct page *page, gfp_t
> >>> gfp
> >>> UNUSED_ARG)>
> >>> 
> >>>    	}
> >>>    
> >>>    }
> >>> 
> >>> +int reiser4_migratepage(struct address_space *mapping, struct page
> >>> *newpage, +			struct page *page, enum migrate_mode mode)
> >>> +{
> >>> +	jnode *node;
> >>> +	int result;
> >>> +
> >>> +	assert("???-1", PageLocked(page));
> >>> +	assert("???-2", !PageWriteback(page));
> >>> +	assert("???-3", reiser4_schedulable());
> >>> +
> >>> +	if (PageDirty(page)) {
> >>> +		/*
> >>> +		 * Logic from migrate.c:fallback_migrate_page()
> >>> +		 * Only writeback pages in full synchronous migration
> >>> +		 */
> >>> +		if (mode != MIGRATE_SYNC)
> >>> +			return -EBUSY;
> >>> +		return write_one_page(page, true)<   0 ? -EIO : -EAGAIN;
> >> 
> >> Why? The migrate.c's writeout() doesn't work?
> >> I see it performs some cleanups..
> > 
> > Hi,
> > 
> > Because it's static. :)
> > But OK, I exported writeout() and used it.
> > 
> >>> +	}
> >>> +
> >>> +	assert("???-4", !PageDirty(page));
> >>> +
> >>> +	assert("???-5", page->mapping != NULL);
> >>> +	assert("???-6", page->mapping->host != NULL);
> >>> +
> >>> +	/*
> >>> +	 * Iteration 1: release page before default migration by
> >>> migrate_page().
> >>> +	 * Iteration 2: check if the page is releasable; if true, directly
> >>> replace +	 *              jnode's page pointer instead of releasing,
> >>> then
> >>> migrate using +	 *              default migrate_page().
> >>> +	 */
> >>> +
> >>> +	/* -- comment taken from mm/migrate.c:migrate_page_move_mapping() --
> >>> +	 * The number of remaining references must be:
> >>> +	 * 1 for anonymous pages without a mapping
> >>> +	 * 2 for pages with a mapping
> >>> +	 * 3 for pages with a mapping and PagePrivate/PagePrivate2 set.
> >>> +	 */
> >>> +
> >>> +	if (page_count(page)>   (PagePrivate(page) ? 3 : 2))
> >> 
> >> OK
> >> 
> >>> +		return -EIO;
> >> 
> >> -EAGAIN would be better (like fallback_migrate_page() returns in
> >> the case of "non-releasable")
> > 
> > Fixed.
> > 
> >>> +
> >>> +	/*
> >>> +	 * Non-referenced non-PagePrivate pages are e. g. anonymous pages.
> >> 
> >> "vfs-anonymous" are pages with page->mapping == NULL;
> >> 
> >> "reiser4-anonymous" are pages which are dirtied via ->mmap(),
> >> but not yet captured by reiser4 transaction manager.
> >> 
> >> Note, that none of those cases takes place at this point.
> > 
> > Hm, indeed that's an insane comment. Fixed.
> > 
> >>> +	 * If any, just migrate them using default routine.
> >>> +	 */
> >>> +
> >>> +	if (!PagePrivate(page))
> >>> +		return migrate_page(mapping, newpage, page, mode);
> >> 
> >> OK
> >> 
> >>> +
> >>> +	node = jnode_by_page(page);
> >>> +	assert("???-7", node != NULL);
> >>> +
> >>> +	/* releasable() needs jnode lock, because it looks at the jnode fields
> >>> +	 * and we need jload_lock here to avoid races with jload(). */
> >>> +	spin_lock_jnode(node);
> >>> +	spin_lock(&(node->load));
> >>> +	if (jnode_is_releasable(node)) {
> >>> +		jref(node);
> >> 
> >> OK
> >> 
> >>> +
> >>> +		/* there is no need to synchronize against
> >>> +		 * jnode_extent_write() here, because pages seen by
> >>> +		 * jnode_extent_write() are !releasable(). */
> >>> +		page_clear_jnode(page, node);
> >>> +
> >>> +		if(jprivate(newpage)) {
> >> 
> >> Hmm, strange: the newpage is freshly allocated and locked..
> >> Somebody doesn't clean up after himself ???  Don't have other
> >> ideas..
> > 
> > Yes, apparently. There are no calls to set_page_private() in compaction.c
> > or migrate.c (actually, in last file there are some, but they do not get
> > called in observed pathes).
> > 
> >>> +			// FIXME: warning or what? happens on a regular basis, behavior 
is
> >>> unaffected. +			warning("???-10", "Migration destination page has a
> >>> non-NULL private field (%x) - resetting it", page_private(newpage));
> >> 
> >> Does it look like a pointer, which can be dereferenced?
> >> (print it better in %p format). If so, try to detect a
> >> jnode at this address by jnode's magic (JMAGIC 0x52654973,
> >> present only when debug is on). To make sure we are not
> >> the culprit..
> > 
> > Cannot be dereferenced. Proved by a handful of oopses...
> > The addresses (as printed by %p) look like "ffffea0002419800" and then
> > I get pagefault oopses for addresses like "0000000000001010".
> 
> This is compaction manager, who provides pages with not cleared
> private info. Lustre had the same problem. I've discussed with VS,
> and here is his original explanation:
> 
> Vladimir Saveliev wrote:
>  > migrate_pages() used to allocate new pages with a function passed as a
>  > parameter.
>  > 
>  > int migrate_pages(struct list_head *from,
>  > new_page_t get_new_page, unsigned long private, bool offlining,
>  > bool sync)
>  > 
>  > In most cases allocating function passed to migrate_pages() ends up
>  > with __alloc_pages() which calls get_page_from_freelist()
>  > ->..-> prep_new_page() which sets page->private to 0.
>  > 
>  > There is one exception however. In case of compact_zone() (introduced
>  > in rhel6 kernels) allocating function is compaction_alloc().
>  > 
>  > This function seems to avoid traditional page allocation path, it
>  > takes free pages from isolated free lists and page->private does not
>  > get set to 0.
>  > 
>  > Then migrate_page_move_mapping() puts that new page into mapping's
>  > page tree where lustre's ll_read_ahead_page() finds nonprivate page
>  > with page->private != 0 and oops-es.
>  > 
> >>> +			set_page_private(newpage, 0ul);
> >>> +		}
> >>> +
> >>> +		result = migrate_page(mapping, newpage, page, mode);
> >>> +		if (unlikely(result)) {
> >>> +			jnode_attach_page(node, page); /* migration failed - reattach 
the
> >>> old
> >>> page */ +		} else {
> >>> +			jnode_attach_page(node, newpage);
> >>> +		}
> >>> +
> >>> +		jput(node);
> >>> +
> >>> +		spin_unlock(&(node->load));
> >>> +		spin_unlock_jnode(node);
> >>> +		assert("???-9", reiser4_schedulable());
> >>> +		return result;
> >>> +	} else {
> >>> +		spin_unlock(&(node->load));
> >>> +		spin_unlock_jnode(node);
> >>> +		assert("???-8", reiser4_schedulable());
> >>> +		return -EIO;
> >> 
> >> -EAGAIN
> > 
> > Fixed.
> > 
> >>> +	}
> >>> +}
> >>> +
> >> 
> >> So simply releasing the page (by default migration routine) doesn't
> >> work, while such transfer of the relationship (page, jnode) to
> >> another pair (newpage, jnode) does work? Can not understand this..
> >> 
> >> E.g. vmscanner (vmscan.c), which also releases releasable page, but
> >> doesn't provide a newpage instead (in contrast with migration) works
> >> perfectly..
> > 
> > /me just looked at vmscan.c, which also skips ->releasepage() when there
> > is
> > no private data on page, and wonders how did that ever work with reiser4.
> 
> It does right things: ->releasepage() is only to free all resources
> related to private info. Moreover, being called with zeroed private,
> ->releasepage() will oops.
> 
> Sorry for confusion: I was unhappy that default migratepage doesn't
> check reference counter. Actually, it does: migrate_page() calls
> migrate_page_move_mapping(), which checks if the counter is
> 2 + page_has_private(page).
> 
> I think that default migratepage (fallback_migrate_page) must work for
> all file systems. For some of them it can be suboptimal, so migration
> developers provided possibility to supply an optimal one via
> ->migratepage asop. I am sure we'll eventually understand why default
> migratepage doesn't work for reiser4..


Hm... I looked at that function (migrate_page_move_mapping()) before and 
somehow became convinced that it does not check refcount.
And now I have completely no ideas. :)

> 
> > So I'd say there is nothing strange in that the fallback_migrate_page()
> > does not work while our custom one works (it's all about remembering
> > about non-PagePrivate pages); instead, it is strange that the vmscanner
> > works while fallback_migrate_page() does not.
> > 
> > Maybe vmscanner finds out that the page is used by some other means?..
> > 
> >> Did you stress it well enough?
> > 
> > Yes, I think so. Parallel building + various git operations + databases +
> > virtual machines.> 
> >> I think we'll release reiser4-for-3.7 with fail_migrate_page (to fix
> >> a stability point), and continue to play with migration..
> > 
> > Maybe.. What are the next targets to play?
> 
> It depends on your preferences. What do you want?
> Resolve performance issues? Add new features? Let me know and
> I'll shed more light to the specified item...

Yes, I'd like to work on a couple of things. I'll start a new thread then.

Thanks,
Ivan.

> 
> Thanks,
> Edward.
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux File System Development]     [Linux BTRFS]     [Linux NFS]     [Linux Filesystems]     [Ext4 Filesystem]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Resources]

  Powered by Linux