Re: Kernel config option which causes reiser4 to be instable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16 December 2012 16:36:38 Edward Shishkin wrote:
> On 12/14/2012 07:20 PM, Ivan Shapovalov wrote:
> > On 14 December 2012 12:07:56 Edward Shishkin wrote:
> >> On 12/14/2012 04:14 AM, Ivan Shapovalov wrote:
> >>> On 13 December 2012 23:47:10 Edward Shishkin wrote:
> >>>> On 12/11/2012 09:54 PM, Dušan Čolić wrote:
> >>>>> On Tue, Dec 11, 2012 at 7:33 PM, Edward Shishkin
> >>>>> 
> >>>>> <edward.shishkin@xxxxxxxxx>    wrote:
> >>>>>> On 12/11/2012 04:08 PM, Ivan Shapovalov wrote:
> >>>>>>> Hello!
> >>>>>> 
> >>>>>> Hello.
> >>>>>> 
> >>>>>>> With help of Dušan Čolić<dusanc@xxxxxxxxx>    who provided his
> >>>>>>> kernel
> >>>>>>> config
> >>>>>>> diff I've found a kernel option which, when disabled, greatly
> >>>>>>> reduces
> >>>>>>> (hopefully to zero, but need time to verify it) corruption rate in
> >>>>>>> reiser4.
> >>>>>>> 
> >>>>>>> It's CONFIG_TRANSPARENT_HUGEPAGE (or something which is used by it
> >>>>>>> like
> >>>>>>> CONFIG_COMPACTION or CONFIG_MIGRATION).
> >>>>>>> For now I'm testing it with CONFIG_TRANSPARENT_HUGEPAGE disabled
> >>>>>> 
> >>>>>> How long?
> >>>>> 
> >>>>> For me the difference in uptime is months without vs hours with it :D
> >>>>> on 2.6.39.4
> >>>> 
> >>>> Hm, indeed: my setup with enabled migration can not survive even one
> >>>> kernel compilation, while with disabled migration everything looks ok..
> >>> 
> >>> The overnight testing also showed no errors...
> >>> So shall we release reiser4-for-3.7 and announce FIXED(?) once again?
> >>> 
> >>> :)
> >> 
> >> I worry that migration is mandatory option for hugepages.
> >> Does fail_migrate_page() work with hugepages?
> > 
> > _Apparently_ yes. We have a counter named "compact_pagemigrate_failed" in
> > /proc/vmstat (documented in vm/transhuge.txt), which means that failing a
> > page migration is not a critical event. So hugepages and compaction will
> > work, albeit quite less effectively...
> > 
> > ...And I've immediately got a bunch of (presumably silly) questions
> 
> Nop. Good questions.
> 
> 
>   while
> 
> > trying to implement ->migratepage().
> > 
> > 1) Why it is needed to writeback dirty pages before migrating them?
> > 
> > 2) Looking at the default implementation (fallback_migrate_page()), what
> > is
> > the meaning of migrating a released page?
> 
> To make sure that nobody uses the page.
> 
> Just imagine: we allocate a page, take a reference, make page uptodate.
> At this point migration routine steals the page. Then we do kmap(), but
> virtual address is wrong. Welcome to corruption..
> 
> So, at first, migration routine wants to make sure that file system
> doesn't use the page: try_to_release_page() checks a reference
> counter (see e.g reiser4_releasepage).
> 
> 
>   In other words, doesn't "releasing"
> 
> > page anyway mean "completely freeing" it, requiring the fs to read
> > corresponding data again?
> 
> File system can not use a pointer to page which has been released.
> We should obtain a new pointer (via find_get_page(), etc). IMHO dirty
> page is a special case (this is regarding your question #1)
> 
> > 3) As far as I could understand, migrating page (from fs's point of view)
> > is just replacing all internal pointers to the "old" page with pointers
> > to the new one together with calling predefined functions
> > migrate_page_move_mapping() and migrate_page_copy(). So here's a question
> > - which structures of reiser4 (beyond jnode->pg) keep pointers to pages
> > and how to access them, given a single page?
> 
> Those pointers shouldn't be a concern, as we use them with reference
> counters hold. I don't see where we reuse pointers to released page.
> 
> When a page is successfully released, we detach it from jnode (see
> page_clear_jnode() in reiser4_releasepage()).
> 
> > I can remember cryptcompress's struct cluster_handle which stores an array
> > of pages...
> 
> All cluster handles do have a status of local variables. After
> checkin_page_cluster() we forget about the pointers while reference
> counters are still hold. After checkout_page_cluster() we drop
> reference counters and also forget about the pointers.
> 
> I see that default migration routine tries to release only pages
> with non-zero private info. It won't work for reiser4, as not all
> our pages has non-zero private info. For files managed by
> cryptcompress plugin we allocate one jnode per page cluster (by
> default 16 pages for page size 4K). And only first page of the
> cluster gets non-zero private info. So reiser4_migratepage() should
> try to release _all_ pages, not only ones with non-zero private info.
> 
> Still don't have ideas why we get corruption in the case of files
> managed by (default) unix-file plugin (where we allocate one jnode
> per page)..

Hello Edward,

I've apparently managed to get a working implementation of ->migratepage().

I'm attaching a patch; it seems stable, but I'm worried somewhat because the 
destination pages (parameter newpage) sometimes (pretty often) have a non-NULL 
private field.

Currently I've put there a warning and a set_page_private(newpage, 0) - it 
seems to work, but well...
I'm continuing to test it and will report if that actually has bad 
consequences.

- Ivan

> 
> Edward.
> 
> > Thanks,
> > Ivan.
> > 
> >> Also before the release I'll try to take a look at this:
> >> http://marc.info/?l=reiserfs-devel&m=135402207623711&w=2
> >> 
> >> This failed path might indicate that we adjusted to fs-writeback
> >> incorrectly.
> >> 
> >> Edward.
> >> 
> >>> Regards,
> >>> Ivan.
> >>> 
> >>>>>>>      on kernel
> >>>>>>> 
> >>>>>>> 3.6.10, and everything seems to be OK so far (so the workaround is
> >>>>>>> version-
> >>>>>>> agnostic).
> >>>>>>> 
> >>>>>>> Edward, are there any guesses on what can make reiser4 choke on
> >>>>>>> hugepages/compaction/migration?
> >>>>>> 
> >>>>>> TBH, no ideas. They (hugepages) are _transparent_.
> >>>>>> It means we shouldn't suffer in theory ;)
> >>>>>> 
> >>>>>>>      I'm not even barely familiar with the kernel
> >>>>>>> 
> >>>>>>> internals.
> >>>>>>> 
> >>>>>>> Thanks,
> >>>>>>> Ivan.
diff --git a/fs/reiser4/as_ops.c b/fs/reiser4/as_ops.c
index 8d8a37d..20006fc 100644
--- a/fs/reiser4/as_ops.c
+++ b/fs/reiser4/as_ops.c
@@ -43,6 +43,7 @@
 #include <linux/backing-dev.h>
 #include <linux/quotaops.h>
 #include <linux/security.h>
+#include <linux/migrate.h>
 
 /* address space operations */
 
@@ -304,6 +305,98 @@ int reiser4_releasepage(struct page *page, gfp_t gfp UNUSED_ARG)
 	}
 }
 
+int reiser4_migratepage(struct address_space *mapping, struct page *newpage,
+			struct page *page, enum migrate_mode mode)
+{
+	jnode *node;
+	int result;
+
+	assert("???-1", PageLocked(page));
+	assert("???-2", !PageWriteback(page));
+	assert("???-3", reiser4_schedulable());
+
+	if (PageDirty(page)) {
+		/*
+		 * Logic from migrate.c:fallback_migrate_page()
+		 * Only writeback pages in full synchronous migration
+		 */
+		if (mode != MIGRATE_SYNC)
+			return -EBUSY;
+		return write_one_page(page, true) < 0 ? -EIO : -EAGAIN;
+	}
+
+	assert("???-4", !PageDirty(page));
+
+	assert("???-5", page->mapping != NULL);
+	assert("???-6", page->mapping->host != NULL);
+
+	/*
+	 * Iteration 1: release page before default migration by migrate_page().
+	 * Iteration 2: check if the page is releasable; if true, directly replace
+	 *              jnode's page pointer instead of releasing, then migrate using
+	 *              default migrate_page().
+	 */
+
+	/* -- comment taken from mm/migrate.c:migrate_page_move_mapping() --
+	 * The number of remaining references must be:
+	 * 1 for anonymous pages without a mapping
+	 * 2 for pages with a mapping
+	 * 3 for pages with a mapping and PagePrivate/PagePrivate2 set.
+	 */
+
+	if (page_count(page) > (PagePrivate(page) ? 3 : 2))
+		return -EIO;
+
+	/*
+	 * Non-referenced non-PagePrivate pages are e. g. anonymous pages.
+	 * If any, just migrate them using default routine.
+	 */
+
+	if (!PagePrivate(page))
+		return migrate_page(mapping, newpage, page, mode);
+
+	node = jnode_by_page(page);
+	assert("???-7", node != NULL);
+
+	/* releasable() needs jnode lock, because it looks at the jnode fields
+	 * and we need jload_lock here to avoid races with jload(). */
+	spin_lock_jnode(node);
+	spin_lock(&(node->load));
+	if (jnode_is_releasable(node)) {
+		jref(node);
+
+		/* there is no need to synchronize against
+		 * jnode_extent_write() here, because pages seen by
+		 * jnode_extent_write() are !releasable(). */
+		page_clear_jnode(page, node);
+
+		if(jprivate(newpage)) {
+			// FIXME: warning or what? happens on a regular basis, behavior is unaffected.
+			warning("???-10", "Migration destination page has a non-NULL private field (%x) - resetting it", page_private(newpage));
+			set_page_private(newpage, 0ul);
+		}
+
+		result = migrate_page(mapping, newpage, page, mode);
+		if (unlikely(result)) {
+			jnode_attach_page(node, page); /* migration failed - reattach the old page */
+		} else {
+			jnode_attach_page(node, newpage);
+		}
+
+		jput(node);
+
+		spin_unlock(&(node->load));
+		spin_unlock_jnode(node);
+		assert("???-9", reiser4_schedulable());
+		return result;
+	} else {
+		spin_unlock(&(node->load));
+		spin_unlock_jnode(node);
+		assert("???-8", reiser4_schedulable());
+		return -EIO;
+	}
+}
+
 int reiser4_readpage(struct file *file, struct page *page)
 {
 	assert("edward-1533", PageLocked(page));
diff --git a/fs/reiser4/page_cache.c b/fs/reiser4/page_cache.c
index bce07ea..1bbb9dc 100644
--- a/fs/reiser4/page_cache.c
+++ b/fs/reiser4/page_cache.c
@@ -548,7 +548,8 @@ static struct address_space_operations formatted_fake_as_ops = {
 	   and, may be made page itself free-able.
 	 */
 	.releasepage = reiser4_releasepage,
-	.direct_IO = NULL
+	.direct_IO = NULL,
+	.migratepage = reiser4_migratepage
 };
 
 /* called just before page is released (no longer used by reiser4). Callers:
diff --git a/fs/reiser4/plugin/object.c b/fs/reiser4/plugin/object.c
index e8db03b..ef37a2d 100644
--- a/fs/reiser4/plugin/object.c
+++ b/fs/reiser4/plugin/object.c
@@ -121,7 +121,8 @@ static struct address_space_operations regular_file_a_ops = {
 	.write_end = reiser4_write_end_careful,
 	.bmap = reiser4_bmap_careful,
 	.invalidatepage = reiser4_invalidatepage,
-	.releasepage = reiser4_releasepage
+	.releasepage = reiser4_releasepage,
+	.migratepage = reiser4_migratepage
 };
 
 /* VFS methods for symlink files */
@@ -172,7 +173,8 @@ static struct address_space_operations directory_a_ops = {
 	.write_end = bugop,
 	.bmap = bugop,
 	.invalidatepage = bugop,
-	.releasepage = bugop
+	.releasepage = bugop,
+	.migratepage = bugop
 };
 
 /*
diff --git a/fs/reiser4/vfs_ops.h b/fs/reiser4/vfs_ops.h
index 6ca85f8..3c6c0f3 100644
--- a/fs/reiser4/vfs_ops.h
+++ b/fs/reiser4/vfs_ops.h
@@ -24,6 +24,8 @@ int reiser4_writepage(struct page *, struct writeback_control *);
 int reiser4_set_page_dirty(struct page *);
 void reiser4_invalidatepage(struct page *, unsigned long offset);
 int reiser4_releasepage(struct page *, gfp_t);
+int reiser4_migratepage(struct address_space *, struct page *,
+			struct page *, enum migrate_mode);
 
 extern int reiser4_update_sd(struct inode *);
 extern int reiser4_add_nlink(struct inode *, struct inode *, int);

[Index of Archives]     [Linux File System Development]     [Linux BTRFS]     [Linux NFS]     [Linux Filesystems]     [Ext4 Filesystem]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Resources]

  Powered by Linux