Re: [PATCH] hfsplus: release bnode pages after use, not before

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 18 June 2015 at 19:16, Hin-Tak Leung <hintak.leung@xxxxxxxxx> wrote:
> On 18 June 2015 at 17:19, Sergei Antonov <saproj@xxxxxxxxx> wrote:
>> On 18 June 2015 at 17:37, Hin-Tak Leung <hintak.leung@xxxxxxxxx> wrote:
>>> On 18 June 2015 at 13:58, Sergei Antonov <saproj@xxxxxxxxx> wrote:
>>>> On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@xxxxxxxxx> wrote:
>>>>> Hi Andrew and everybody else,
>>>>>
>>>>> I looked through the pre-git history and seem to have found the reason of how
>>>>> the current code had come to be, and why Sergei's fix seems to involve
>>>>> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry,
>>>>> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old
>>>>> debug code"
>>>>> commit below did not remove the right stuff.
>>>>>
>>>>> Looking at the old code, I think REF_PAGES may have started out being
>>>>> 1 (i.e. pages are released right after create, then get/put when needed, no
>>>>> need to release on bnode_free),
>>>>> then the code was modified for efficiency so that pages are not released
>>>>> on bnode_create and not put/get at bnode_put/get but release at bnode_free.
>>>>> But it was still largely working in the REF_PAGE=1 mode (and with the commented
>>>>> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE).
>>>>> Then it was flipped over, and seems to work, and things were forgotten.
>>>>>
>>>>> I think the release at bnode_free was commented out because the person who
>>>>> wrote it wasn't sure - I am not sure either, since it seems like there might be
>>>>> other/more places that one might need to release the pages than just
>>>>> at bnode_free().
>>>>>
>>>>> Does this train of thought make sense?
>>>>
>>>> No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well,
>>>> at first it was my thought too. But later I considered what the logic
>>>> of the author could have been with REF_PAGES=0 and REF_PAGES=1 and
>>>> came to a conclusion that REF_PAGES=1 was a safe approach and
>>>> REF_PAGES=0 was an experimental unsafe release-early approach, so
>>>> !REF_PAGES was what he really meant.
>>>>
>>>
>>> You are wrong, it seems. I found even earlier versions of the code,
>>> with history, at
>>> http://sourceforge.net/projects/linux-hfsplus/ .
>>> The earlier version of the code
>>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/)
>>> does get_page and put_page at bnode_get and bnode_put, while the later version
>>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/
>>> )
>>> have those removed, dated 2001 december and 2002 june.
>>
>> Interesting. I only looked at historical linux git.
>>
>>> The change log says so:
>>>
>>> Date:   Thu Jun 6 09:45:14 2002 +0000
>>>
>>>     Use buffer cache instead of page cache in bnode.c. Cache inode extents.
>>>
>>> The change also removed any page_cache_release() . It would seem that
>>> the additional REF_PAGES
>>> conditional was a fork/rewrite/clean-up which re-uses the earlier
>>> development logic, and eventually took over.
>>>
>>> The release-early code came first, not after. That makes sense, as
>>> releasing early
>>
>> We may be talking about different things. I did not write what code
>> came earlier and what after. Anyway, this becomes to difficult to
>> follow. My point was that in the variant without refs, calling
>> page_cache_release() right after to read_cache_page() was exactly what
>> the author wanted (although it is a bug).
>>
>> Again, we are probably arguing about different things. And I'm a
>> little less zealous than you to do this code archeology.
>>
>
> I would really prefer that you don't use personal descriptions such as "zealous"
> as well as making the kind of comments such as
> "The result of the test (which I'm sure will show the bug is gone) will
> be by far more valuable contribution than your other messages in the
> thread.". Those are not helpful.
>
> In my first reply I have already stated my reasoning: your fix consists
> mainly of removing an "if 0" and re-enabling seemingly dead code. So the
> dead code was written for a purpose which is now lost and forgotten.

Of course. And it is not hard to guess that purpose quite reliably :).
The code used to release pages correctly, i. e. when freeing the bnode
structure.

> And here is that purpose - early development of the code was done
> with getting and putting pages within each routine, for safeness, and migrating
> towards caching pages for a longer life cycle in 2001-2002. When the
> code was cleaned
> up and re-worked before putting into the kernel in 2005, this
> development process
> was repeated but with a REF_PAGES conditional and some additional code
> which was // commented out.
>
> The migration to caching pages
> would have been more or less correct, if REF_PAGES was flipped
> and the // commented out code was enabled.  Except there was a mistake
> in one of the conditionals

By mistake you mean !REF_PAGES in __hfs_bnode_create()? But it is not
a mistake under the conditions you provided above. With REF_PAGES
flipped to 1 and the // piece uncommented you don't want to release in
__hfs_bnode_create() because releasing is done in hfs_bnode_free().

>, and also the commented out code never got
> enabled. The commented out code was where a !REF_PAGES supposed to be.
> the rest should all be all plain "if REF_PAGES". The singular "if !REF_PAGES"
> was a mistake that has propagated.
>
>
>>> and read_cache_page on each bnode_get is the slower, safer but
>>> inefficient approach.
>>> It would be absurd to go from caching pages, then to 'experiment' to
>>> see if release-early + later read_cache_page
>>> on each bnode_get 'improves'.
>>>
>>>>>
>>>>> Hin-Tak
>>>>>
>>>>> <commit>
>>>>> commit a5e3985fa014029eb6795664c704953720cc7f7d
>>>>> Author: Roman Zippel <zippel@xxxxxxxxxxxxxx>
>>>>> Date:   Tue Sep 6 15:18:47 2005 -0700
>>>>>
>>>>>     [PATCH] hfs: remove debug code
>>>>>
>>>>>     This removes some old debug code, which is no longer needed.
>>>>>
>>>>>     Signed-off-by: Roman Zippel <zippel@xxxxxxxxxxxxxx>
>>>>>     Signed-off-by: Andrew Morton <akpm@xxxxxxxx>
>>>>>     Signed-off-by: Linus Torvalds <torvalds@xxxxxxxx>
>>>>>
>>>>> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
>>>>> index a096c5a..3d5cdc6 100644
>>>>> --- a/fs/hfs/bnode.c
>>>>> +++ b/fs/hfs/bnode.c
>>>>> @@ -13,8 +13,6 @@
>>>>>
>>>>>  #include "btree.h"
>>>>>
>>>>> -#define REF_PAGES    0
>>>>> -
>>>>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf,
>>>>>          int off, int len)
>>>>>  {
>>>>> @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>>>>> hfs_btree *tree, u32 cnid)
>>>>>              page_cache_release(page);
>>>>>              goto fail;
>>>>>          }
>>>>> -#if !REF_PAGES
>>>>>          page_cache_release(page);
>>>>> -#endif
>>>>>          node->page[i] = page;
>>>>>      }
>>>>>
>>>>> @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>>>>  {
>>>>>      if (node) {
>>>>>          atomic_inc(&node->refcnt);
>>>>> -#if REF_PAGES
>>>>> -        {
>>>>> -        int i;
>>>>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>>> -            get_page(node->page[i]);
>>>>> -        }
>>>>> -#endif
>>>>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>>      }
>>>>> @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>>          if (!atomic_read(&node->refcnt))
>>>>>              BUG();
>>>>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>>>>> -#if REF_PAGES
>>>>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>>>>> -                put_page(node->page[i]);
>>>>> -#endif
>>>>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>>>>              return;
>>>>> -        }
>>>>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>>>>              if (!node->page[i])
>>>>>                  continue;
>>>>>              mark_page_accessed(node->page[i]);
>>>>> -#if REF_PAGES
>>>>> -            put_page(node->page[i]);
>>>>> -#endif
>>>>>          }
>>>>>
>>>>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>>>> index 8868d3b..b85abc6 100644
>>>>> --- a/fs/hfsplus/bnode.c
>>>>> +++ b/fs/hfsplus/bnode.c
>>>>> @@ -18,8 +18,6 @@
>>>>>  #include "hfsplus_fs.h"
>>>>>  #include "hfsplus_raw.h"
>>>>>
>>>>> -#define REF_PAGES    0
>>>>> -
>>>>>  /* Copy a specified range of bytes from the raw data of a node */
>>>>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len)
>>>>>  {
>>>>> @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>>>>> hfs_btree *tree, u32 cnid)
>>>>>              page_cache_release(page);
>>>>>              goto fail;
>>>>>          }
>>>>> -#if !REF_PAGES
>>>>>          page_cache_release(page);
>>>>> -#endif
>>>>>          node->page[i] = page;
>>>>>      }
>>>>>
>>>>> @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>>>>  {
>>>>>      if (node) {
>>>>>          atomic_inc(&node->refcnt);
>>>>> -#if REF_PAGES
>>>>> -        {
>>>>> -        int i;
>>>>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>>> -            get_page(node->page[i]);
>>>>> -        }
>>>>> -#endif
>>>>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>>      }
>>>>> @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>>          if (!atomic_read(&node->refcnt))
>>>>>              BUG();
>>>>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>>>>> -#if REF_PAGES
>>>>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>>>>> -                put_page(node->page[i]);
>>>>> -#endif
>>>>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>>>>              return;
>>>>> -        }
>>>>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>>>>              if (!node->page[i])
>>>>>                  continue;
>>>>>              mark_page_accessed(node->page[i]);
>>>>> -#if REF_PAGES
>>>>> -            put_page(node->page[i]);
>>>>> -#endif
>>>>>          }
>>>>>
>>>>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>>>>> </commit>
>>>>>
>>>>> On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx> wrote:
>>>>>> ------------------------------
>>>>>> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote:
>>>>>>
>>>>>>>Hi Andrew,
>>>>>>>
>>>>>>>Can you please take this patch up and get it merged into mainline?  Despite Vyacheslav's lamentations this patch is obviously correct.  __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed.
>>>>>>>
>>>>>>>Feel free to add:
>>>>>>>
>>>>>>>Reviewed-by: Anton Altaparmakov <anton@xxxxxxxxxx>
>>>>>>>
>>>>>>>Thanks a lot in advance!
>>>>>>>
>>>>>>>Best regards,
>>>>>>>
>>>>>>>    Anton
>>>>>>
>>>>>> I went around and looked at the code and Anton is right -  __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong.
>>>>>>
>>>>>> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point.
>>>>>>
>>>>>> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead.
>>>>>> I'm looking over the pre-git history to see how that comes about.
>>>>>>
>>>>>> Hin-Tak
>>>>>>
>>>>>>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@xxxxxxxxx> wrote:
>>>>>>>>
>>>>>>>> Fix this bugreport by Sasha Levin:
>>>>>>>> http://lkml.org/lkml/2015/2/20/85 ("use after free")
>>>>>>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
>>>>>>>>
>>>>>>>> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
>>>>>>>> Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>
>>>>>>>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>>>>>>>> Cc: Vyacheslav Dubeyko <slava@xxxxxxxxxxx>
>>>>>>>> Cc: Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx>
>>>>>>>> Cc: Sougata Santra <sougata@xxxxxxxxxx>
>>>>>>>> Reported-by: Sasha Levin <sasha.levin@xxxxxxxxxx>
>>>>>>>> Signed-off-by: Sergei Antonov <saproj@xxxxxxxxx>
>>>>>>>> ---
>>>>>>>> fs/hfsplus/bnode.c | 6 ++----
>>>>>>>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>>>>>>> index 759708f..5af50fb 100644
>>>>>>>> --- a/fs/hfsplus/bnode.c
>>>>>>>> +++ b/fs/hfsplus/bnode.c
>>>>>>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
>>>>>>>>             page_cache_release(page);
>>>>>>>>             goto fail;
>>>>>>>>         }
>>>>>>>> -        page_cache_release(page);
>>>>>>>>         node->page[i] = page;
>>>>>>>>     }
>>>>>>>>
>>>>>>>> @@ -566,13 +565,12 @@ node_error:
>>>>>>>>
>>>>>>>> void hfs_bnode_free(struct hfs_bnode *node)
>>>>>>>> {
>>>>>>>> -#if 0
>>>>>>>>     int i;
>>>>>>>>
>>>>>>>> -    for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>>>>>> +    for (i = 0; i < node->tree->pages_per_bnode; i++) {
>>>>>>>>         if (node->page[i])
>>>>>>>>             page_cache_release(node->page[i]);
>>>>>>>> -#endif
>>>>>>>> +    }
>>>>>>>>     kfree(node);
>>>>>>>> }
>>>>>>>>
>>>>>>>> --
>>>>>>>> 2.3.0
>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>>--
>>>>>>>Anton Altaparmakov <anton at tuxera.com> (replace at with @)
>>>>>>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
>>>>>>>Linux NTFS maintainer
>>>>>>>
>>>>>>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux