Re: [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hopefully this should go through. I am posting from my gmail account
(which I rarely use)
as vger.kernel.org (not just linux-fsdevel@, but also git@) is bouncing
my @users.sf.net account also.

On 30 June 2015 at 17:59, Michael Fox <415fox@xxxxxxxxx> wrote:
> By the way, linux-fsdevel@xxxxxxxxxxxxxxx, bounced my response.
>
> On Tue, Jun 30, 2015 at 9:56 AM, Michael Fox <415fox@xxxxxxxxx> wrote:
>>
>> Hi Hin-Tak. I gave up on dual-booting many hard drives ago so I have
>> nowhere to test your patch. Thank you for your work. Your e-mail is
>> especially thorough and professional. I hope your patch gets accepted soon.
>>
>> On Tue, Jun 30, 2015 at 8:55 AM, Hin-Tak Leung <hintak.leung@xxxxxxxxx>
>> wrote:
>>>
>>> Michael Fox: FYI. If you feel like responding (and possibly add a
>>> "Tested-By:"),
>>> please include liux-fs-devel linux-fsdevel in the reply.
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From: Hin-Tak Leung <hintak.leung@xxxxxxxxx>
>>> Date: 28 June 2015 at 02:39
>>> Subject: [PATCH v2] hfs,hfsplus: cache pages correctly between
>>> bnode_create and bnode_free
>>> To: linux-fsdevel@xxxxxxxxxxxxxxx
>>> Cc: Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx>, Sergei Antonov
>>> <saproj@xxxxxxxxx>, Al Viro <viro@xxxxxxxxxxxxxxxxxx>, Christoph
>>> Hellwig <hch@xxxxxxxxxxxxx>, Andrew Morton
>>> <akpm@xxxxxxxxxxxxxxxxxxxx>, Vyacheslav Dubeyko <slava@xxxxxxxxxxx>,
>>> Sougata Santra <sougata@xxxxxxxxxx>
>>>
>>>
>>> From: Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx>
>>>
>>> Pages looked up by __hfs_bnode_create() (called by hfs_bnode_create() and
>>> hfs_bnode_find() for finding or creating pages corresponding to an inode)
>>> are
>>> immediately kmap()'ed and used (both read and write) and kunmap()'ed, and
>>> should not be page_cache_release()'ed until hfs_bnode_free().
>>>
>>> This patch fixes a problem I first saw in July 2012: merely running "du"
>>> on a
>>> large hfsplus-mounted directory a few times on a reasonably loaded system
>>> would
>>> get the hfsplus driver all confused and complaining about B-tree
>>> inconsistencies, and generates a "BUG: Bad page state". Most recently, I
>>> can
>>> generate this problem on up-to-date Fedora 22 with shipped kernel 4.0.5,
>>> by
>>> running "du /" (="/" + "/home" + "/mnt" + other smaller mounts) and "du
>>> /mnt"
>>> simultaneously on two windows, where /mnt is a lightly-used QEMU VM image
>>> of
>>> the full Mac OS X 10.9:
>>>
>>> $ df -i / /home /mnt
>>> Filesystem                  Inodes   IUsed      IFree IUse% Mounted on
>>> /dev/mapper/fedora-root    3276800  551665    2725135   17% /
>>> /dev/mapper/fedora-home   52879360  716221   52163139    2% /home
>>> /dev/nbd0p2             4294967295 1387818 4293579477    1% /mnt
>>>
>>> After applying the patch, I was able to run "du /" (60+ times) and
>>> "du /mnt" (150+ times) continuously and simultaneously for 6+ hours.
>>>
>>> There are many reports of the hfsplus driver getting confused under load
>>> and
>>> generating "BUG: Bad page state" or other similar issues over the years.
>>> [1]
>>>
>>> The unpatched code [2] has always been wrong since it entered the kernel
>>> tree.
>>> The only reason why it gets away with it is that the kmap/memcpy/kunmap
>>> follow
>>> very quickly after the page_cache_release() so the kernel has not had a
>>> chance
>>> to reuse the memory for something else, most of the time.
>>>
>>> The current RW driver appears to have followed the design and development
>>> of
>>> the earlier read-only hfsplus driver [3], where-by version 0.1 (Dec 2001)
>>> had a
>>> B-tree node-centric approach to read_cache_page()/page_cache_release()
>>> per
>>> bnode_get()/bnode_put(), migrating towards version 0.2 (June 2002) of
>>> caching
>>> and releasing pages per inode extents. When the current RW code first
>>> entered
>>> the kernel [2] in 2005, there was an REF_PAGES conditional (and "//"
>>> commented
>>> out code) to switch between B-node centric paging to inode-centric
>>> paging.
>>> There was a mistake with the direction of one of the REF_PAGES
>>> conditionals in
>>> __hfs_bnode_create(). In a subsequent "remove debug code" commit [4], the
>>> read_cache_page()/page_cache_release() per bnode_get()/bnode_put() were
>>> removed, but a page_cache_release() was mistakenly left in (propagating
>>> the
>>> "REF_PAGES <-> !REF_PAGE" mistake), and the commented-out
>>> page_cache_release()
>>> in bnode_release() (which should be spanned by !REF_PAGES) was never
>>> enabled.
>>>
>>> References:
>>> [1]:
>>> Michael Fox, Apr 2013
>>> http://www.spinics.net/lists/linux-fsdevel/msg63807.html
>>> ("hfsplus volume suddenly inaccessable after 'hfs: recoff %d too large'")
>>>
>>> Sasha Levin, Feb 2015
>>> http://lkml.org/lkml/2015/2/20/85 ("use after free")
>>>
>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814
>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887
>>> https://bugzilla.kernel.org/show_bug.cgi?id=42342
>>> https://bugzilla.kernel.org/show_bug.cgi?id=63841
>>> https://bugzilla.kernel.org/show_bug.cgi?id=78761
>>>
>>> [2]:
>>> http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
>>> fs/hfs/bnode.c?id=d1081202f1d0ee35ab0beb490da4b65d4bc763db
>>> commit d1081202f1d0ee35ab0beb490da4b65d4bc763db
>>> Author: Andrew Morton <akpm@xxxxxxxx>
>>> Date:   Wed Feb 25 16:17:36 2004 -0800
>>>
>>>     [PATCH] HFS rewrite
>>>
>>> http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
>>> fs/hfsplus/bnode.c?id=91556682e0bf004d98a529bf829d339abb98bbbd
>>>
>>> commit 91556682e0bf004d98a529bf829d339abb98bbbd
>>> Author: Andrew Morton <akpm@xxxxxxxx>
>>> Date:   Wed Feb 25 16:17:48 2004 -0800
>>>
>>>     [PATCH] HFS+ support
>>>
>>> [3]:
>>> http://sourceforge.net/projects/linux-hfsplus/
>>>
>>>
>>> http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/
>>>
>>> http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/
>>>
>>> http://linux-hfsplus.cvs.sourceforge.net/viewvc/linux-hfsplus/linux/\
>>> fs/hfsplus/bnode.c?r1=1.4&r2=1.5
>>>
>>> Date:   Thu Jun 6 09:45:14 2002 +0000
>>> Use buffer cache instead of page cache in bnode.c. Cache inode extents.
>>>
>>> [4]:
>>> http://git.kernel.org/cgit/linux/kernel/git/\
>>>
>>> stable/linux-stable.git/commit/?id=a5e3985fa014029eb6795664c704953720cc7f7d
>>>
>>> commit a5e3985fa014029eb6795664c704953720cc7f7d
>>> Author: Roman Zippel <zippel@xxxxxxxxxxxxxx>
>>> Date:   Tue Sep 6 15:18:47 2005 -0700
>>>
>>> [PATCH] hfs: remove debug code
>>>
>>> Signed-off-by: Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx>
>>> Signed-off-by: Sergei Antonov <saproj@xxxxxxxxx>
>>> Reviewed-by: Anton Altaparmakov <anton@xxxxxxxxxx>
>>> Reported-by: Sasha Levin <sasha.levin@xxxxxxxxxx>
>>> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
>>> Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>
>>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>>> Cc: Vyacheslav Dubeyko <slava@xxxxxxxxxxx>
>>> Cc: Sougata Santra <sougata@xxxxxxxxxx>
>>> ---
>>>  fs/hfs/bnode.c     | 9 ++++-----
>>>  fs/hfsplus/bnode.c | 3 ---
>>>  2 files changed, 4 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
>>> index d3fa6bd..221719e 100644
>>> --- a/fs/hfs/bnode.c
>>> +++ b/fs/hfs/bnode.c
>>> @@ -288,7 +288,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>>> hfs_btree *tree, u32 cnid)
>>>                         page_cache_release(page);
>>>                         goto fail;
>>>                 }
>>> -               page_cache_release(page);
>>>                 node->page[i] = page;
>>>         }
>>>
>>> @@ -398,11 +397,11 @@ node_error:
>>>
>>>  void hfs_bnode_free(struct hfs_bnode *node)
>>>  {
>>> -       //int i;
>>> +       int i;
>>>
>>> -       //for (i = 0; i < node->tree->pages_per_bnode; i++)
>>> -       //      if (node->page[i])
>>> -       //              page_cache_release(node->page[i]);
>>> +       for (i = 0; i < node->tree->pages_per_bnode; i++)
>>> +               if (node->page[i])
>>> +                       page_cache_release(node->page[i]);
>>>         kfree(node);
>>>  }
>>>
>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>> index 759708f..6392466 100644
>>> --- a/fs/hfsplus/bnode.c
>>> +++ b/fs/hfsplus/bnode.c
>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>>> hfs_btree *tree, u32 cnid)
>>>                         page_cache_release(page);
>>>                         goto fail;
>>>                 }
>>> -               page_cache_release(page);
>>>                 node->page[i] = page;
>>>         }
>>>
>>> @@ -566,13 +565,11 @@ node_error:
>>>
>>>  void hfs_bnode_free(struct hfs_bnode *node)
>>>  {
>>> -#if 0
>>>         int i;
>>>
>>>         for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>                 if (node->page[i])
>>>                         page_cache_release(node->page[i]);
>>> -#endif
>>>         kfree(node);
>>>  }
>>>
>>> --
>>> 2.4.3
>>
>>
>>
>>
>> --
>>
>> -
>> Michael
>
>
>
>
> --
>
> -
> Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux