Hopefully this should go through. I am posting from my gmail account (which I rarely use) as vger.kernel.org (not just linux-fsdevel@, but also git@) is bouncing my @users.sf.net account also. On 30 June 2015 at 17:59, Michael Fox <415fox@xxxxxxxxx> wrote: > By the way, linux-fsdevel@xxxxxxxxxxxxxxx, bounced my response. > > On Tue, Jun 30, 2015 at 9:56 AM, Michael Fox <415fox@xxxxxxxxx> wrote: >> >> Hi Hin-Tak. I gave up on dual-booting many hard drives ago so I have >> nowhere to test your patch. Thank you for your work. Your e-mail is >> especially thorough and professional. I hope your patch gets accepted soon. >> >> On Tue, Jun 30, 2015 at 8:55 AM, Hin-Tak Leung <hintak.leung@xxxxxxxxx> >> wrote: >>> >>> Michael Fox: FYI. If you feel like responding (and possibly add a >>> "Tested-By:"), >>> please include liux-fs-devel linux-fsdevel in the reply. >>> >>> >>> ---------- Forwarded message ---------- >>> From: Hin-Tak Leung <hintak.leung@xxxxxxxxx> >>> Date: 28 June 2015 at 02:39 >>> Subject: [PATCH v2] hfs,hfsplus: cache pages correctly between >>> bnode_create and bnode_free >>> To: linux-fsdevel@xxxxxxxxxxxxxxx >>> Cc: Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx>, Sergei Antonov >>> <saproj@xxxxxxxxx>, Al Viro <viro@xxxxxxxxxxxxxxxxxx>, Christoph >>> Hellwig <hch@xxxxxxxxxxxxx>, Andrew Morton >>> <akpm@xxxxxxxxxxxxxxxxxxxx>, Vyacheslav Dubeyko <slava@xxxxxxxxxxx>, >>> Sougata Santra <sougata@xxxxxxxxxx> >>> >>> >>> From: Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx> >>> >>> Pages looked up by __hfs_bnode_create() (called by hfs_bnode_create() and >>> hfs_bnode_find() for finding or creating pages corresponding to an inode) >>> are >>> immediately kmap()'ed and used (both read and write) and kunmap()'ed, and >>> should not be page_cache_release()'ed until hfs_bnode_free(). >>> >>> This patch fixes a problem I first saw in July 2012: merely running "du" >>> on a >>> large hfsplus-mounted directory a few times on a reasonably loaded system >>> would >>> get the hfsplus driver all confused and complaining about B-tree >>> inconsistencies, and generates a "BUG: Bad page state". Most recently, I >>> can >>> generate this problem on up-to-date Fedora 22 with shipped kernel 4.0.5, >>> by >>> running "du /" (="/" + "/home" + "/mnt" + other smaller mounts) and "du >>> /mnt" >>> simultaneously on two windows, where /mnt is a lightly-used QEMU VM image >>> of >>> the full Mac OS X 10.9: >>> >>> $ df -i / /home /mnt >>> Filesystem Inodes IUsed IFree IUse% Mounted on >>> /dev/mapper/fedora-root 3276800 551665 2725135 17% / >>> /dev/mapper/fedora-home 52879360 716221 52163139 2% /home >>> /dev/nbd0p2 4294967295 1387818 4293579477 1% /mnt >>> >>> After applying the patch, I was able to run "du /" (60+ times) and >>> "du /mnt" (150+ times) continuously and simultaneously for 6+ hours. >>> >>> There are many reports of the hfsplus driver getting confused under load >>> and >>> generating "BUG: Bad page state" or other similar issues over the years. >>> [1] >>> >>> The unpatched code [2] has always been wrong since it entered the kernel >>> tree. >>> The only reason why it gets away with it is that the kmap/memcpy/kunmap >>> follow >>> very quickly after the page_cache_release() so the kernel has not had a >>> chance >>> to reuse the memory for something else, most of the time. >>> >>> The current RW driver appears to have followed the design and development >>> of >>> the earlier read-only hfsplus driver [3], where-by version 0.1 (Dec 2001) >>> had a >>> B-tree node-centric approach to read_cache_page()/page_cache_release() >>> per >>> bnode_get()/bnode_put(), migrating towards version 0.2 (June 2002) of >>> caching >>> and releasing pages per inode extents. When the current RW code first >>> entered >>> the kernel [2] in 2005, there was an REF_PAGES conditional (and "//" >>> commented >>> out code) to switch between B-node centric paging to inode-centric >>> paging. >>> There was a mistake with the direction of one of the REF_PAGES >>> conditionals in >>> __hfs_bnode_create(). In a subsequent "remove debug code" commit [4], the >>> read_cache_page()/page_cache_release() per bnode_get()/bnode_put() were >>> removed, but a page_cache_release() was mistakenly left in (propagating >>> the >>> "REF_PAGES <-> !REF_PAGE" mistake), and the commented-out >>> page_cache_release() >>> in bnode_release() (which should be spanned by !REF_PAGES) was never >>> enabled. >>> >>> References: >>> [1]: >>> Michael Fox, Apr 2013 >>> http://www.spinics.net/lists/linux-fsdevel/msg63807.html >>> ("hfsplus volume suddenly inaccessable after 'hfs: recoff %d too large'") >>> >>> Sasha Levin, Feb 2015 >>> http://lkml.org/lkml/2015/2/20/85 ("use after free") >>> >>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814 >>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887 >>> https://bugzilla.kernel.org/show_bug.cgi?id=42342 >>> https://bugzilla.kernel.org/show_bug.cgi?id=63841 >>> https://bugzilla.kernel.org/show_bug.cgi?id=78761 >>> >>> [2]: >>> http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\ >>> fs/hfs/bnode.c?id=d1081202f1d0ee35ab0beb490da4b65d4bc763db >>> commit d1081202f1d0ee35ab0beb490da4b65d4bc763db >>> Author: Andrew Morton <akpm@xxxxxxxx> >>> Date: Wed Feb 25 16:17:36 2004 -0800 >>> >>> [PATCH] HFS rewrite >>> >>> http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\ >>> fs/hfsplus/bnode.c?id=91556682e0bf004d98a529bf829d339abb98bbbd >>> >>> commit 91556682e0bf004d98a529bf829d339abb98bbbd >>> Author: Andrew Morton <akpm@xxxxxxxx> >>> Date: Wed Feb 25 16:17:48 2004 -0800 >>> >>> [PATCH] HFS+ support >>> >>> [3]: >>> http://sourceforge.net/projects/linux-hfsplus/ >>> >>> >>> http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/ >>> >>> http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/ >>> >>> http://linux-hfsplus.cvs.sourceforge.net/viewvc/linux-hfsplus/linux/\ >>> fs/hfsplus/bnode.c?r1=1.4&r2=1.5 >>> >>> Date: Thu Jun 6 09:45:14 2002 +0000 >>> Use buffer cache instead of page cache in bnode.c. Cache inode extents. >>> >>> [4]: >>> http://git.kernel.org/cgit/linux/kernel/git/\ >>> >>> stable/linux-stable.git/commit/?id=a5e3985fa014029eb6795664c704953720cc7f7d >>> >>> commit a5e3985fa014029eb6795664c704953720cc7f7d >>> Author: Roman Zippel <zippel@xxxxxxxxxxxxxx> >>> Date: Tue Sep 6 15:18:47 2005 -0700 >>> >>> [PATCH] hfs: remove debug code >>> >>> Signed-off-by: Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx> >>> Signed-off-by: Sergei Antonov <saproj@xxxxxxxxx> >>> Reviewed-by: Anton Altaparmakov <anton@xxxxxxxxxx> >>> Reported-by: Sasha Levin <sasha.levin@xxxxxxxxxx> >>> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx> >>> Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx> >>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> >>> Cc: Vyacheslav Dubeyko <slava@xxxxxxxxxxx> >>> Cc: Sougata Santra <sougata@xxxxxxxxxx> >>> --- >>> fs/hfs/bnode.c | 9 ++++----- >>> fs/hfsplus/bnode.c | 3 --- >>> 2 files changed, 4 insertions(+), 8 deletions(-) >>> >>> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c >>> index d3fa6bd..221719e 100644 >>> --- a/fs/hfs/bnode.c >>> +++ b/fs/hfs/bnode.c >>> @@ -288,7 +288,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct >>> hfs_btree *tree, u32 cnid) >>> page_cache_release(page); >>> goto fail; >>> } >>> - page_cache_release(page); >>> node->page[i] = page; >>> } >>> >>> @@ -398,11 +397,11 @@ node_error: >>> >>> void hfs_bnode_free(struct hfs_bnode *node) >>> { >>> - //int i; >>> + int i; >>> >>> - //for (i = 0; i < node->tree->pages_per_bnode; i++) >>> - // if (node->page[i]) >>> - // page_cache_release(node->page[i]); >>> + for (i = 0; i < node->tree->pages_per_bnode; i++) >>> + if (node->page[i]) >>> + page_cache_release(node->page[i]); >>> kfree(node); >>> } >>> >>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c >>> index 759708f..6392466 100644 >>> --- a/fs/hfsplus/bnode.c >>> +++ b/fs/hfsplus/bnode.c >>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct >>> hfs_btree *tree, u32 cnid) >>> page_cache_release(page); >>> goto fail; >>> } >>> - page_cache_release(page); >>> node->page[i] = page; >>> } >>> >>> @@ -566,13 +565,11 @@ node_error: >>> >>> void hfs_bnode_free(struct hfs_bnode *node) >>> { >>> -#if 0 >>> int i; >>> >>> for (i = 0; i < node->tree->pages_per_bnode; i++) >>> if (node->page[i]) >>> page_cache_release(node->page[i]); >>> -#endif >>> kfree(node); >>> } >>> >>> -- >>> 2.4.3 >> >> >> >> >> -- >> >> - >> Michael > > > > > -- > > - > Michael -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html