Re: [PATCH] hfsplus: release bnode pages after use, not before

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 18 June 2015 at 17:37, Hin-Tak Leung <hintak.leung@xxxxxxxxx> wrote:
> On 18 June 2015 at 13:58, Sergei Antonov <saproj@xxxxxxxxx> wrote:
>> On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@xxxxxxxxx> wrote:
>>> Hi Andrew and everybody else,
>>>
>>> I looked through the pre-git history and seem to have found the reason of how
>>> the current code had come to be, and why Sergei's fix seems to involve
>>> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry,
>>> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old
>>> debug code"
>>> commit below did not remove the right stuff.
>>>
>>> Looking at the old code, I think REF_PAGES may have started out being
>>> 1 (i.e. pages are released right after create, then get/put when needed, no
>>> need to release on bnode_free),
>>> then the code was modified for efficiency so that pages are not released
>>> on bnode_create and not put/get at bnode_put/get but release at bnode_free.
>>> But it was still largely working in the REF_PAGE=1 mode (and with the commented
>>> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE).
>>> Then it was flipped over, and seems to work, and things were forgotten.
>>>
>>> I think the release at bnode_free was commented out because the person who
>>> wrote it wasn't sure - I am not sure either, since it seems like there might be
>>> other/more places that one might need to release the pages than just
>>> at bnode_free().
>>>
>>> Does this train of thought make sense?
>>
>> No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well,
>> at first it was my thought too. But later I considered what the logic
>> of the author could have been with REF_PAGES=0 and REF_PAGES=1 and
>> came to a conclusion that REF_PAGES=1 was a safe approach and
>> REF_PAGES=0 was an experimental unsafe release-early approach, so
>> !REF_PAGES was what he really meant.
>>
>
> You are wrong, it seems. I found even earlier versions of the code,
> with history, at
> http://sourceforge.net/projects/linux-hfsplus/ .
> The earlier version of the code
> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/)
> does get_page and put_page at bnode_get and bnode_put, while the later version
> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/
> )
> have those removed, dated 2001 december and 2002 june.

Interesting. I only looked at historical linux git.

> The change log says so:
>
> Date:   Thu Jun 6 09:45:14 2002 +0000
>
>     Use buffer cache instead of page cache in bnode.c. Cache inode extents.
>
> The change also removed any page_cache_release() . It would seem that
> the additional REF_PAGES
> conditional was a fork/rewrite/clean-up which re-uses the earlier
> development logic, and eventually took over.
>
> The release-early code came first, not after. That makes sense, as
> releasing early

We may be talking about different things. I did not write what code
came earlier and what after. Anyway, this becomes to difficult to
follow. My point was that in the variant without refs, calling
page_cache_release() right after to read_cache_page() was exactly what
the author wanted (although it is a bug).

Again, we are probably arguing about different things. And I'm a
little less zealous than you to do this code archeology.

> and read_cache_page on each bnode_get is the slower, safer but
> inefficient approach.
> It would be absurd to go from caching pages, then to 'experiment' to
> see if release-early + later read_cache_page
> on each bnode_get 'improves'.
>
>>>
>>> Hin-Tak
>>>
>>> <commit>
>>> commit a5e3985fa014029eb6795664c704953720cc7f7d
>>> Author: Roman Zippel <zippel@xxxxxxxxxxxxxx>
>>> Date:   Tue Sep 6 15:18:47 2005 -0700
>>>
>>>     [PATCH] hfs: remove debug code
>>>
>>>     This removes some old debug code, which is no longer needed.
>>>
>>>     Signed-off-by: Roman Zippel <zippel@xxxxxxxxxxxxxx>
>>>     Signed-off-by: Andrew Morton <akpm@xxxxxxxx>
>>>     Signed-off-by: Linus Torvalds <torvalds@xxxxxxxx>
>>>
>>> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
>>> index a096c5a..3d5cdc6 100644
>>> --- a/fs/hfs/bnode.c
>>> +++ b/fs/hfs/bnode.c
>>> @@ -13,8 +13,6 @@
>>>
>>>  #include "btree.h"
>>>
>>> -#define REF_PAGES    0
>>> -
>>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf,
>>>          int off, int len)
>>>  {
>>> @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>>> hfs_btree *tree, u32 cnid)
>>>              page_cache_release(page);
>>>              goto fail;
>>>          }
>>> -#if !REF_PAGES
>>>          page_cache_release(page);
>>> -#endif
>>>          node->page[i] = page;
>>>      }
>>>
>>> @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>>  {
>>>      if (node) {
>>>          atomic_inc(&node->refcnt);
>>> -#if REF_PAGES
>>> -        {
>>> -        int i;
>>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>>> -            get_page(node->page[i]);
>>> -        }
>>> -#endif
>>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>      }
>>> @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>          if (!atomic_read(&node->refcnt))
>>>              BUG();
>>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>>> -#if REF_PAGES
>>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>>> -                put_page(node->page[i]);
>>> -#endif
>>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>>              return;
>>> -        }
>>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>>              if (!node->page[i])
>>>                  continue;
>>>              mark_page_accessed(node->page[i]);
>>> -#if REF_PAGES
>>> -            put_page(node->page[i]);
>>> -#endif
>>>          }
>>>
>>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>> index 8868d3b..b85abc6 100644
>>> --- a/fs/hfsplus/bnode.c
>>> +++ b/fs/hfsplus/bnode.c
>>> @@ -18,8 +18,6 @@
>>>  #include "hfsplus_fs.h"
>>>  #include "hfsplus_raw.h"
>>>
>>> -#define REF_PAGES    0
>>> -
>>>  /* Copy a specified range of bytes from the raw data of a node */
>>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len)
>>>  {
>>> @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>>> hfs_btree *tree, u32 cnid)
>>>              page_cache_release(page);
>>>              goto fail;
>>>          }
>>> -#if !REF_PAGES
>>>          page_cache_release(page);
>>> -#endif
>>>          node->page[i] = page;
>>>      }
>>>
>>> @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>>  {
>>>      if (node) {
>>>          atomic_inc(&node->refcnt);
>>> -#if REF_PAGES
>>> -        {
>>> -        int i;
>>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>>> -            get_page(node->page[i]);
>>> -        }
>>> -#endif
>>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>      }
>>> @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>          if (!atomic_read(&node->refcnt))
>>>              BUG();
>>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>>> -#if REF_PAGES
>>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>>> -                put_page(node->page[i]);
>>> -#endif
>>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>>              return;
>>> -        }
>>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>>              if (!node->page[i])
>>>                  continue;
>>>              mark_page_accessed(node->page[i]);
>>> -#if REF_PAGES
>>> -            put_page(node->page[i]);
>>> -#endif
>>>          }
>>>
>>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>>> </commit>
>>>
>>> On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx> wrote:
>>>> ------------------------------
>>>> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote:
>>>>
>>>>>Hi Andrew,
>>>>>
>>>>>Can you please take this patch up and get it merged into mainline?  Despite Vyacheslav's lamentations this patch is obviously correct.  __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed.
>>>>>
>>>>>Feel free to add:
>>>>>
>>>>>Reviewed-by: Anton Altaparmakov <anton@xxxxxxxxxx>
>>>>>
>>>>>Thanks a lot in advance!
>>>>>
>>>>>Best regards,
>>>>>
>>>>>    Anton
>>>>
>>>> I went around and looked at the code and Anton is right -  __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong.
>>>>
>>>> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point.
>>>>
>>>> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead.
>>>> I'm looking over the pre-git history to see how that comes about.
>>>>
>>>> Hin-Tak
>>>>
>>>>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@xxxxxxxxx> wrote:
>>>>>>
>>>>>> Fix this bugreport by Sasha Levin:
>>>>>> http://lkml.org/lkml/2015/2/20/85 ("use after free")
>>>>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
>>>>>>
>>>>>> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
>>>>>> Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>
>>>>>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>>>>>> Cc: Vyacheslav Dubeyko <slava@xxxxxxxxxxx>
>>>>>> Cc: Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx>
>>>>>> Cc: Sougata Santra <sougata@xxxxxxxxxx>
>>>>>> Reported-by: Sasha Levin <sasha.levin@xxxxxxxxxx>
>>>>>> Signed-off-by: Sergei Antonov <saproj@xxxxxxxxx>
>>>>>> ---
>>>>>> fs/hfsplus/bnode.c | 6 ++----
>>>>>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>>>>> index 759708f..5af50fb 100644
>>>>>> --- a/fs/hfsplus/bnode.c
>>>>>> +++ b/fs/hfsplus/bnode.c
>>>>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
>>>>>>             page_cache_release(page);
>>>>>>             goto fail;
>>>>>>         }
>>>>>> -        page_cache_release(page);
>>>>>>         node->page[i] = page;
>>>>>>     }
>>>>>>
>>>>>> @@ -566,13 +565,12 @@ node_error:
>>>>>>
>>>>>> void hfs_bnode_free(struct hfs_bnode *node)
>>>>>> {
>>>>>> -#if 0
>>>>>>     int i;
>>>>>>
>>>>>> -    for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>>>> +    for (i = 0; i < node->tree->pages_per_bnode; i++) {
>>>>>>         if (node->page[i])
>>>>>>             page_cache_release(node->page[i]);
>>>>>> -#endif
>>>>>> +    }
>>>>>>     kfree(node);
>>>>>> }
>>>>>>
>>>>>> --
>>>>>> 2.3.0
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>--
>>>>>Anton Altaparmakov <anton at tuxera.com> (replace at with @)
>>>>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
>>>>>Linux NTFS maintainer
>>>>>
>>>>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux