Re: Kernel config option which causes reiser4 to be instable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/29/2012 07:47 PM, Ivan Shapovalov wrote:
On 29 December 2012 01:24:48 Edward Shishkin wrote:
On 12/26/2012 05:22 PM, Ivan Shapovalov wrote:
Hello Edward,

Hi Ivan,

I've apparently managed to get a working implementation of
->migratepage().

I'm attaching a patch; it seems stable, but I'm worried somewhat because
the destination pages (parameter newpage) sometimes (pretty often) have a
non-NULL private field.

Currently I've put there a warning and a set_page_private(newpage, 0) - it
seems to work, but well...
I'm continuing to test it and will report if that actually has bad
consequences.

- Ivan

  Edward.

  >   Thanks,
  >   Ivan.
  >
  >>   Also before the release I'll try to take a look at this:
  >>   http://marc.info/?l=reiserfs-devel&m=135402207623711&w=2
  >>
  >>   This failed path might indicate that we adjusted to fs-writeback
  >>   incorrectly.
  >>
  >>   Edward.
  >>
  >>>   Regards,
  >>>   Ivan.
  >>>
  >>>>>>>         on kernel
  >>>>>>>
  >>>>>>>   3.6.10, and everything seems to be OK so far (so the
  >>>>>>>   workaround is
  >>>>>>>   version-
  >>>>>>>   agnostic).
  >>>>>>>
  >>>>>>>   Edward, are there any guesses on what can make reiser4
  >>>>>>>   choke on
  >>>>>>>   hugepages/compaction/migration?
  >>>>>>
  >>>>>>   TBH, no ideas. They (hugepages) are_transparent_.
  >>>>>>   It means we shouldn't suffer in theory;)
  >>>>>>
  >>>>>>>         I'm not even barely familiar with the kernel
  >>>>>>>
  >>>>>>>   internals.
  >>>>>>>
  >>>>>>>   Thanks,
  >>>>>>>   Ivan.

reiser4-migratepage.patch


diff --git a/fs/reiser4/as_ops.c b/fs/reiser4/as_ops.c
index 8d8a37d..20006fc 100644
--- a/fs/reiser4/as_ops.c
+++ b/fs/reiser4/as_ops.c
@@ -43,6 +43,7 @@

   #include<linux/backing-dev.h>
   #include<linux/quotaops.h>
   #include<linux/security.h>

+#include<linux/migrate.h>

   /* address space operations */

@@ -304,6 +305,98 @@ int reiser4_releasepage(struct page *page, gfp_t gfp
UNUSED_ARG)>
   	}

   }

+int reiser4_migratepage(struct address_space *mapping, struct page
*newpage, +			struct page *page, enum migrate_mode mode)
+{
+	jnode *node;
+	int result;
+
+	assert("???-1", PageLocked(page));
+	assert("???-2", !PageWriteback(page));
+	assert("???-3", reiser4_schedulable());
+
+	if (PageDirty(page)) {
+		/*
+		 * Logic from migrate.c:fallback_migrate_page()
+		 * Only writeback pages in full synchronous migration
+		 */
+		if (mode != MIGRATE_SYNC)
+			return -EBUSY;
+		return write_one_page(page, true)<   0 ? -EIO : -EAGAIN;

Why? The migrate.c's writeout() doesn't work?
I see it performs some cleanups..

Hi,

Because it's static. :)
But OK, I exported writeout() and used it.


+	}
+
+	assert("???-4", !PageDirty(page));
+
+	assert("???-5", page->mapping != NULL);
+	assert("???-6", page->mapping->host != NULL);
+
+	/*
+	 * Iteration 1: release page before default migration by migrate_page().
+	 * Iteration 2: check if the page is releasable; if true, directly
replace +	 *              jnode's page pointer instead of releasing, then
migrate using +	 *              default migrate_page().
+	 */
+
+	/* -- comment taken from mm/migrate.c:migrate_page_move_mapping() --
+	 * The number of remaining references must be:
+	 * 1 for anonymous pages without a mapping
+	 * 2 for pages with a mapping
+	 * 3 for pages with a mapping and PagePrivate/PagePrivate2 set.
+	 */
+
+	if (page_count(page)>   (PagePrivate(page) ? 3 : 2))

OK

+		return -EIO;

-EAGAIN would be better (like fallback_migrate_page() returns in
the case of "non-releasable")

Fixed.


+
+	/*
+	 * Non-referenced non-PagePrivate pages are e. g. anonymous pages.

"vfs-anonymous" are pages with page->mapping == NULL;

"reiser4-anonymous" are pages which are dirtied via ->mmap(),
but not yet captured by reiser4 transaction manager.

Note, that none of those cases takes place at this point.

Hm, indeed that's an insane comment. Fixed.


+	 * If any, just migrate them using default routine.
+	 */
+
+	if (!PagePrivate(page))
+		return migrate_page(mapping, newpage, page, mode);

OK

+
+	node = jnode_by_page(page);
+	assert("???-7", node != NULL);
+
+	/* releasable() needs jnode lock, because it looks at the jnode fields
+	 * and we need jload_lock here to avoid races with jload(). */
+	spin_lock_jnode(node);
+	spin_lock(&(node->load));
+	if (jnode_is_releasable(node)) {
+		jref(node);

OK

+
+		/* there is no need to synchronize against
+		 * jnode_extent_write() here, because pages seen by
+		 * jnode_extent_write() are !releasable(). */
+		page_clear_jnode(page, node);
+
+		if(jprivate(newpage)) {

Hmm, strange: the newpage is freshly allocated and locked..
Somebody doesn't clean up after himself ???  Don't have other
ideas..

Yes, apparently. There are no calls to set_page_private() in compaction.c or
migrate.c (actually, in last file there are some, but they do not get called
in observed pathes).


+			// FIXME: warning or what? happens on a regular basis, behavior is
unaffected. +			warning("???-10", "Migration destination page has a
non-NULL private field (%x) - resetting it", page_private(newpage));
Does it look like a pointer, which can be dereferenced?
(print it better in %p format). If so, try to detect a
jnode at this address by jnode's magic (JMAGIC 0x52654973,
present only when debug is on). To make sure we are not
the culprit..

Cannot be dereferenced. Proved by a handful of oopses...
The addresses (as printed by %p) look like "ffffea0002419800" and then
I get pagefault oopses for addresses like "0000000000001010".


This is compaction manager, who provides pages with not cleared
private info. Lustre had the same problem. I've discussed with VS,
and here is his original explanation:

Vladimir Saveliev wrote:
> migrate_pages() used to allocate new pages with a function passed as a
> parameter.
>
> int migrate_pages(struct list_head *from,
> new_page_t get_new_page, unsigned long private, bool offlining,
> bool sync)
>
> In most cases allocating function passed to migrate_pages() ends up
> with __alloc_pages() which calls get_page_from_freelist()
> ->..-> prep_new_page() which sets page->private to 0.
>
> There is one exception however. In case of compact_zone() (introduced
> in rhel6 kernels) allocating function is compaction_alloc().
>
> This function seems to avoid traditional page allocation path, it
> takes free pages from isolated free lists and page->private does not
> get set to 0.
>
> Then migrate_page_move_mapping() puts that new page into mapping's
> page tree where lustre's ll_read_ahead_page() finds nonprivate page
> with page->private != 0 and oops-es.




+			set_page_private(newpage, 0ul);
+		}
+
+		result = migrate_page(mapping, newpage, page, mode);
+		if (unlikely(result)) {
+			jnode_attach_page(node, page); /* migration failed - reattach the old
page */ +		} else {
+			jnode_attach_page(node, newpage);
+		}
+
+		jput(node);
+
+		spin_unlock(&(node->load));
+		spin_unlock_jnode(node);
+		assert("???-9", reiser4_schedulable());
+		return result;
+	} else {
+		spin_unlock(&(node->load));
+		spin_unlock_jnode(node);
+		assert("???-8", reiser4_schedulable());
+		return -EIO;

-EAGAIN

Fixed.


+	}
+}
+

So simply releasing the page (by default migration routine) doesn't
work, while such transfer of the relationship (page, jnode) to
another pair (newpage, jnode) does work? Can not understand this..

E.g. vmscanner (vmscan.c), which also releases releasable page, but
doesn't provide a newpage instead (in contrast with migration) works
perfectly..

/me just looked at vmscan.c, which also skips ->releasepage() when there is
no private data on page, and wonders how did that ever work with reiser4.


It does right things: ->releasepage() is only to free all resources
related to private info. Moreover, being called with zeroed private,
->releasepage() will oops.

Sorry for confusion: I was unhappy that default migratepage doesn't
check reference counter. Actually, it does: migrate_page() calls
migrate_page_move_mapping(), which checks if the counter is
2 + page_has_private(page).

I think that default migratepage (fallback_migrate_page) must work for
all file systems. For some of them it can be suboptimal, so migration
developers provided possibility to supply an optimal one via
->migratepage asop. I am sure we'll eventually understand why default
migratepage doesn't work for reiser4..



So I'd say there is nothing strange in that the fallback_migrate_page() does not work
while our custom one works (it's all about remembering about non-PagePrivate pages);
instead, it is strange that the vmscanner works while fallback_migrate_page() does not.

Maybe vmscanner finds out that the page is used by some other means?..


Did you stress it well enough?

Yes, I think so. Parallel building + various git operations + databases + virtual machines.


I think we'll release reiser4-for-3.7 with fail_migrate_page (to fix
a stability point), and continue to play with migration..

Maybe.. What are the next targets to play?


It depends on your preferences. What do you want?
Resolve performance issues? Add new features? Let me know and
I'll shed more light to the specified item...

Thanks,
Edward.
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux File System Development]     [Linux BTRFS]     [Linux NFS]     [Linux Filesystems]     [Ext4 Filesystem]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Resources]

  Powered by Linux