Re: [PATCH] hfsplus: release bnode pages after use, not before

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 18 June 2015 at 21:51, Sergei Antonov <saproj@xxxxxxxxx> wrote:
> On 18 June 2015 at 19:16, Hin-Tak Leung <hintak.leung@xxxxxxxxx> wrote:
>> On 18 June 2015 at 17:19, Sergei Antonov <saproj@xxxxxxxxx> wrote:
>>> On 18 June 2015 at 17:37, Hin-Tak Leung <hintak.leung@xxxxxxxxx> wrote:
>>>> On 18 June 2015 at 13:58, Sergei Antonov <saproj@xxxxxxxxx> wrote:
>>>>> On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@xxxxxxxxx> wrote:
>>>>>> Hi Andrew and everybody else,
>>>>>>
>>>>>> I looked through the pre-git history and seem to have found the reason of how
>>>>>> the current code had come to be, and why Sergei's fix seems to involve
>>>>>> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry,
>>>>>> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old
>>>>>> debug code"
>>>>>> commit below did not remove the right stuff.
>>>>>>
>>>>>> Looking at the old code, I think REF_PAGES may have started out being
>>>>>> 1 (i.e. pages are released right after create, then get/put when needed, no
>>>>>> need to release on bnode_free),
>>>>>> then the code was modified for efficiency so that pages are not released
>>>>>> on bnode_create and not put/get at bnode_put/get but release at bnode_free.
>>>>>> But it was still largely working in the REF_PAGE=1 mode (and with the commented
>>>>>> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE).
>>>>>> Then it was flipped over, and seems to work, and things were forgotten.
>>>>>>
>>>>>> I think the release at bnode_free was commented out because the person who
>>>>>> wrote it wasn't sure - I am not sure either, since it seems like there might be
>>>>>> other/more places that one might need to release the pages than just
>>>>>> at bnode_free().
>>>>>>
>>>>>> Does this train of thought make sense?
>>>>>
>>>>> No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well,
>>>>> at first it was my thought too. But later I considered what the logic
>>>>> of the author could have been with REF_PAGES=0 and REF_PAGES=1 and
>>>>> came to a conclusion that REF_PAGES=1 was a safe approach and
>>>>> REF_PAGES=0 was an experimental unsafe release-early approach, so
>>>>> !REF_PAGES was what he really meant.
>>>>>
>>>>
>>>> You are wrong, it seems. I found even earlier versions of the code,
>>>> with history, at
>>>> http://sourceforge.net/projects/linux-hfsplus/ .
>>>> The earlier version of the code
>>>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/)
>>>> does get_page and put_page at bnode_get and bnode_put, while the later version
>>>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/
>>>> )
>>>> have those removed, dated 2001 december and 2002 june.
>>>
>>> Interesting. I only looked at historical linux git.
>>>
>>>> The change log says so:
>>>>
>>>> Date:   Thu Jun 6 09:45:14 2002 +0000
>>>>
>>>>     Use buffer cache instead of page cache in bnode.c. Cache inode extents.
>>>>
>>>> The change also removed any page_cache_release() . It would seem that
>>>> the additional REF_PAGES
>>>> conditional was a fork/rewrite/clean-up which re-uses the earlier
>>>> development logic, and eventually took over.
>>>>
>>>> The release-early code came first, not after. That makes sense, as
>>>> releasing early
>>>
>>> We may be talking about different things. I did not write what code
>>> came earlier and what after. Anyway, this becomes to difficult to
>>> follow. My point was that in the variant without refs, calling
>>> page_cache_release() right after to read_cache_page() was exactly what
>>> the author wanted (although it is a bug).
>>>
>>> Again, we are probably arguing about different things. And I'm a
>>> little less zealous than you to do this code archeology.
>>>
>>
>> I would really prefer that you don't use personal descriptions such as "zealous"
>> as well as making the kind of comments such as
>> "The result of the test (which I'm sure will show the bug is gone) will
>> be by far more valuable contribution than your other messages in the
>> thread.". Those are not helpful.
>>
>> In my first reply I have already stated my reasoning: your fix consists
>> mainly of removing an "if 0" and re-enabling seemingly dead code. So the
>> dead code was written for a purpose which is now lost and forgotten.
>
> Of course. And it is not hard to guess that purpose quite reliably :).
> The code used to release pages correctly, i. e. when freeing the bnode
> structure.
>
>> And here is that purpose - early development of the code was done
>> with getting and putting pages within each routine, for safeness, and migrating
>> towards caching pages for a longer life cycle in 2001-2002. When the
>> code was cleaned
>> up and re-worked before putting into the kernel in 2005, this
>> development process
>> was repeated but with a REF_PAGES conditional and some additional code
>> which was // commented out.
>>
>> The migration to caching pages
>> would have been more or less correct, if REF_PAGES was flipped
>> and the // commented out code was enabled.  Except there was a mistake
>> in one of the conditionals
>
> By mistake you mean !REF_PAGES in __hfs_bnode_create()? But it is not
> a mistake under the conditions you provided above. With REF_PAGES
> flipped to 1 and the // piece uncommented you don't want to release in
> __hfs_bnode_create() because releasing is done in hfs_bnode_free().
>

Yes, that's what I am saying. I believe the development of the code happens
as follows:

- bnode_get() does get_page(), bnode_put() does put_page(), and the rest
of the routines - i.e. bnode_create() look up the pages but
immediately release them. bnode_free()
does nothing about pages.

- then it was desired to keep the pages longer without the repeated
get_pages and put_pages.
so those are "#if REF_PAGES" out. And we no longer get and put pages on demand
and should also stop releasing them after looking up at bnode_create() also.
(the bnode_create() is a poor name for what it does, but never mind). New code
is added to release at bnode_free(). The new code should have been spanned
by the opposite, i.e. "#if !REF_PAGES", but was inserted as place
holder with //.

The rest of the story I have already tried to explain a couple of
times. The switch
from REF_PAGES=1 to =0 (i.e. from get_pages/put_pages per
bnode_get/bnode_put routines,
to keeping pages around longer between bnode_create/bnode_free) did
not quite work correctly.

For the others who want to see the version of the code with those REF_PAGES
and want to weigh in on this,
I cannot find a web interface for the repo, but it is the commit below
in git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git

commit 91556682e0bf004d98a529bf829d339abb98bbbd
Author: Andrew Morton <akpm@xxxxxxxx>
Date:   Wed Feb 25 16:17:48 2004 -0800

    [PATCH] HFS+ support

    From: Roman Zippel <zippel@xxxxxxxxxxxxxx>

    This driver adds full read/write support for HFS+ and is based on the
    readonly driver by Brad Broyer.

    Thanks to Ethan Benson <erbenson@xxxxxxxxxx> for a number of patches to
    make the driver more compliant with the spec.

I would also repeat that earlier code and change history was at
http://sourceforge.net/projects/linux-hfsplus/, and it showed the
changes between
2001 dec to 2002 june

from get_page/put_page per bnode_get/bnode_put to removing those and
switching to a longer page life-cycle with the log message:

 "Use buffer cache instead of page cache in bnode.c. Cache inode extents."


>>, and also the commented out code never got
>> enabled. The commented out code was where a !REF_PAGES supposed to be.
>> the rest should all be all plain "if REF_PAGES". The singular "if !REF_PAGES"
>> was a mistake that has propagated.
>>
>>
>>>> and read_cache_page on each bnode_get is the slower, safer but
>>>> inefficient approach.
>>>> It would be absurd to go from caching pages, then to 'experiment' to
>>>> see if release-early + later read_cache_page
>>>> on each bnode_get 'improves'.
>>>>
>>>>>>
>>>>>> Hin-Tak
>>>>>>
>>>>>> <commit>
>>>>>> commit a5e3985fa014029eb6795664c704953720cc7f7d
>>>>>> Author: Roman Zippel <zippel@xxxxxxxxxxxxxx>
>>>>>> Date:   Tue Sep 6 15:18:47 2005 -0700
>>>>>>
>>>>>>     [PATCH] hfs: remove debug code
>>>>>>
>>>>>>     This removes some old debug code, which is no longer needed.
>>>>>>
>>>>>>     Signed-off-by: Roman Zippel <zippel@xxxxxxxxxxxxxx>
>>>>>>     Signed-off-by: Andrew Morton <akpm@xxxxxxxx>
>>>>>>     Signed-off-by: Linus Torvalds <torvalds@xxxxxxxx>
>>>>>>
>>>>>> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
>>>>>> index a096c5a..3d5cdc6 100644
>>>>>> --- a/fs/hfs/bnode.c
>>>>>> +++ b/fs/hfs/bnode.c
>>>>>> @@ -13,8 +13,6 @@
>>>>>>
>>>>>>  #include "btree.h"
>>>>>>
>>>>>> -#define REF_PAGES    0
>>>>>> -
>>>>>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf,
>>>>>>          int off, int len)
>>>>>>  {
>>>>>> @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>>>>>> hfs_btree *tree, u32 cnid)
>>>>>>              page_cache_release(page);
>>>>>>              goto fail;
>>>>>>          }
>>>>>> -#if !REF_PAGES
>>>>>>          page_cache_release(page);
>>>>>> -#endif
>>>>>>          node->page[i] = page;
>>>>>>      }
>>>>>>
>>>>>> @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>>>>>  {
>>>>>>      if (node) {
>>>>>>          atomic_inc(&node->refcnt);
>>>>>> -#if REF_PAGES
>>>>>> -        {
>>>>>> -        int i;
>>>>>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>>>> -            get_page(node->page[i]);
>>>>>> -        }
>>>>>> -#endif
>>>>>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>>>      }
>>>>>> @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>>>          if (!atomic_read(&node->refcnt))
>>>>>>              BUG();
>>>>>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>>>>>> -#if REF_PAGES
>>>>>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>>>>>> -                put_page(node->page[i]);
>>>>>> -#endif
>>>>>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>>>>>              return;
>>>>>> -        }
>>>>>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>>>>>              if (!node->page[i])
>>>>>>                  continue;
>>>>>>              mark_page_accessed(node->page[i]);
>>>>>> -#if REF_PAGES
>>>>>> -            put_page(node->page[i]);
>>>>>> -#endif
>>>>>>          }
>>>>>>
>>>>>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>>>>> index 8868d3b..b85abc6 100644
>>>>>> --- a/fs/hfsplus/bnode.c
>>>>>> +++ b/fs/hfsplus/bnode.c
>>>>>> @@ -18,8 +18,6 @@
>>>>>>  #include "hfsplus_fs.h"
>>>>>>  #include "hfsplus_raw.h"
>>>>>>
>>>>>> -#define REF_PAGES    0
>>>>>> -
>>>>>>  /* Copy a specified range of bytes from the raw data of a node */
>>>>>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len)
>>>>>>  {
>>>>>> @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>>>>>> hfs_btree *tree, u32 cnid)
>>>>>>              page_cache_release(page);
>>>>>>              goto fail;
>>>>>>          }
>>>>>> -#if !REF_PAGES
>>>>>>          page_cache_release(page);
>>>>>> -#endif
>>>>>>          node->page[i] = page;
>>>>>>      }
>>>>>>
>>>>>> @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>>>>>  {
>>>>>>      if (node) {
>>>>>>          atomic_inc(&node->refcnt);
>>>>>> -#if REF_PAGES
>>>>>> -        {
>>>>>> -        int i;
>>>>>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>>>> -            get_page(node->page[i]);
>>>>>> -        }
>>>>>> -#endif
>>>>>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>>>      }
>>>>>> @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>>>          if (!atomic_read(&node->refcnt))
>>>>>>              BUG();
>>>>>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>>>>>> -#if REF_PAGES
>>>>>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>>>>>> -                put_page(node->page[i]);
>>>>>> -#endif
>>>>>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>>>>>              return;
>>>>>> -        }
>>>>>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>>>>>              if (!node->page[i])
>>>>>>                  continue;
>>>>>>              mark_page_accessed(node->page[i]);
>>>>>> -#if REF_PAGES
>>>>>> -            put_page(node->page[i]);
>>>>>> -#endif
>>>>>>          }
>>>>>>
>>>>>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>>>>>> </commit>
>>>>>>
>>>>>> On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx> wrote:
>>>>>>> ------------------------------
>>>>>>> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote:
>>>>>>>
>>>>>>>>Hi Andrew,
>>>>>>>>
>>>>>>>>Can you please take this patch up and get it merged into mainline?  Despite Vyacheslav's lamentations this patch is obviously correct.  __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed.
>>>>>>>>
>>>>>>>>Feel free to add:
>>>>>>>>
>>>>>>>>Reviewed-by: Anton Altaparmakov <anton@xxxxxxxxxx>
>>>>>>>>
>>>>>>>>Thanks a lot in advance!
>>>>>>>>
>>>>>>>>Best regards,
>>>>>>>>
>>>>>>>>    Anton
>>>>>>>
>>>>>>> I went around and looked at the code and Anton is right -  __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong.
>>>>>>>
>>>>>>> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point.
>>>>>>>
>>>>>>> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead.
>>>>>>> I'm looking over the pre-git history to see how that comes about.
>>>>>>>
>>>>>>> Hin-Tak
>>>>>>>
>>>>>>>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@xxxxxxxxx> wrote:
>>>>>>>>>
>>>>>>>>> Fix this bugreport by Sasha Levin:
>>>>>>>>> http://lkml.org/lkml/2015/2/20/85 ("use after free")
>>>>>>>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
>>>>>>>>>
>>>>>>>>> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
>>>>>>>>> Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>
>>>>>>>>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>>>>>>>>> Cc: Vyacheslav Dubeyko <slava@xxxxxxxxxxx>
>>>>>>>>> Cc: Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx>
>>>>>>>>> Cc: Sougata Santra <sougata@xxxxxxxxxx>
>>>>>>>>> Reported-by: Sasha Levin <sasha.levin@xxxxxxxxxx>
>>>>>>>>> Signed-off-by: Sergei Antonov <saproj@xxxxxxxxx>
>>>>>>>>> ---
>>>>>>>>> fs/hfsplus/bnode.c | 6 ++----
>>>>>>>>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>>>>>>>> index 759708f..5af50fb 100644
>>>>>>>>> --- a/fs/hfsplus/bnode.c
>>>>>>>>> +++ b/fs/hfsplus/bnode.c
>>>>>>>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
>>>>>>>>>             page_cache_release(page);
>>>>>>>>>             goto fail;
>>>>>>>>>         }
>>>>>>>>> -        page_cache_release(page);
>>>>>>>>>         node->page[i] = page;
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>> @@ -566,13 +565,12 @@ node_error:
>>>>>>>>>
>>>>>>>>> void hfs_bnode_free(struct hfs_bnode *node)
>>>>>>>>> {
>>>>>>>>> -#if 0
>>>>>>>>>     int i;
>>>>>>>>>
>>>>>>>>> -    for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>>>>>>> +    for (i = 0; i < node->tree->pages_per_bnode; i++) {
>>>>>>>>>         if (node->page[i])
>>>>>>>>>             page_cache_release(node->page[i]);
>>>>>>>>> -#endif
>>>>>>>>> +    }
>>>>>>>>>     kfree(node);
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> 2.3.0
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>--
>>>>>>>>Anton Altaparmakov <anton at tuxera.com> (replace at with @)
>>>>>>>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
>>>>>>>>Linux NTFS maintainer
>>>>>>>>
>>>>>>>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux