On 18 June 2015 at 19:16, Hin-Tak Leung <hintak.leung@xxxxxxxxx> wrote: > On 18 June 2015 at 17:19, Sergei Antonov <saproj@xxxxxxxxx> wrote: >> On 18 June 2015 at 17:37, Hin-Tak Leung <hintak.leung@xxxxxxxxx> wrote: >>> On 18 June 2015 at 13:58, Sergei Antonov <saproj@xxxxxxxxx> wrote: >>>> On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@xxxxxxxxx> wrote: >>>>> Hi Andrew and everybody else, >>>>> >>>>> I looked through the pre-git history and seem to have found the reason of how >>>>> the current code had come to be, and why Sergei's fix seems to involve >>>>> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry, >>>>> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old >>>>> debug code" >>>>> commit below did not remove the right stuff. >>>>> >>>>> Looking at the old code, I think REF_PAGES may have started out being >>>>> 1 (i.e. pages are released right after create, then get/put when needed, no >>>>> need to release on bnode_free), >>>>> then the code was modified for efficiency so that pages are not released >>>>> on bnode_create and not put/get at bnode_put/get but release at bnode_free. >>>>> But it was still largely working in the REF_PAGE=1 mode (and with the commented >>>>> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE). >>>>> Then it was flipped over, and seems to work, and things were forgotten. >>>>> >>>>> I think the release at bnode_free was commented out because the person who >>>>> wrote it wasn't sure - I am not sure either, since it seems like there might be >>>>> other/more places that one might need to release the pages than just >>>>> at bnode_free(). >>>>> >>>>> Does this train of thought make sense? >>>> >>>> No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well, >>>> at first it was my thought too. But later I considered what the logic >>>> of the author could have been with REF_PAGES=0 and REF_PAGES=1 and >>>> came to a conclusion that REF_PAGES=1 was a safe approach and >>>> REF_PAGES=0 was an experimental unsafe release-early approach, so >>>> !REF_PAGES was what he really meant. >>>> >>> >>> You are wrong, it seems. I found even earlier versions of the code, >>> with history, at >>> http://sourceforge.net/projects/linux-hfsplus/ . >>> The earlier version of the code >>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/) >>> does get_page and put_page at bnode_get and bnode_put, while the later version >>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/ >>> ) >>> have those removed, dated 2001 december and 2002 june. >> >> Interesting. I only looked at historical linux git. >> >>> The change log says so: >>> >>> Date: Thu Jun 6 09:45:14 2002 +0000 >>> >>> Use buffer cache instead of page cache in bnode.c. Cache inode extents. >>> >>> The change also removed any page_cache_release() . It would seem that >>> the additional REF_PAGES >>> conditional was a fork/rewrite/clean-up which re-uses the earlier >>> development logic, and eventually took over. >>> >>> The release-early code came first, not after. That makes sense, as >>> releasing early >> >> We may be talking about different things. I did not write what code >> came earlier and what after. Anyway, this becomes to difficult to >> follow. My point was that in the variant without refs, calling >> page_cache_release() right after to read_cache_page() was exactly what >> the author wanted (although it is a bug). >> >> Again, we are probably arguing about different things. And I'm a >> little less zealous than you to do this code archeology. >> > > I would really prefer that you don't use personal descriptions such as "zealous" > as well as making the kind of comments such as > "The result of the test (which I'm sure will show the bug is gone) will > be by far more valuable contribution than your other messages in the > thread.". Those are not helpful. > > In my first reply I have already stated my reasoning: your fix consists > mainly of removing an "if 0" and re-enabling seemingly dead code. So the > dead code was written for a purpose which is now lost and forgotten. Of course. And it is not hard to guess that purpose quite reliably :). The code used to release pages correctly, i. e. when freeing the bnode structure. > And here is that purpose - early development of the code was done > with getting and putting pages within each routine, for safeness, and migrating > towards caching pages for a longer life cycle in 2001-2002. When the > code was cleaned > up and re-worked before putting into the kernel in 2005, this > development process > was repeated but with a REF_PAGES conditional and some additional code > which was // commented out. > > The migration to caching pages > would have been more or less correct, if REF_PAGES was flipped > and the // commented out code was enabled. Except there was a mistake > in one of the conditionals By mistake you mean !REF_PAGES in __hfs_bnode_create()? But it is not a mistake under the conditions you provided above. With REF_PAGES flipped to 1 and the // piece uncommented you don't want to release in __hfs_bnode_create() because releasing is done in hfs_bnode_free(). >, and also the commented out code never got > enabled. The commented out code was where a !REF_PAGES supposed to be. > the rest should all be all plain "if REF_PAGES". The singular "if !REF_PAGES" > was a mistake that has propagated. > > >>> and read_cache_page on each bnode_get is the slower, safer but >>> inefficient approach. >>> It would be absurd to go from caching pages, then to 'experiment' to >>> see if release-early + later read_cache_page >>> on each bnode_get 'improves'. >>> >>>>> >>>>> Hin-Tak >>>>> >>>>> <commit> >>>>> commit a5e3985fa014029eb6795664c704953720cc7f7d >>>>> Author: Roman Zippel <zippel@xxxxxxxxxxxxxx> >>>>> Date: Tue Sep 6 15:18:47 2005 -0700 >>>>> >>>>> [PATCH] hfs: remove debug code >>>>> >>>>> This removes some old debug code, which is no longer needed. >>>>> >>>>> Signed-off-by: Roman Zippel <zippel@xxxxxxxxxxxxxx> >>>>> Signed-off-by: Andrew Morton <akpm@xxxxxxxx> >>>>> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxx> >>>>> >>>>> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c >>>>> index a096c5a..3d5cdc6 100644 >>>>> --- a/fs/hfs/bnode.c >>>>> +++ b/fs/hfs/bnode.c >>>>> @@ -13,8 +13,6 @@ >>>>> >>>>> #include "btree.h" >>>>> >>>>> -#define REF_PAGES 0 >>>>> - >>>>> void hfs_bnode_read(struct hfs_bnode *node, void *buf, >>>>> int off, int len) >>>>> { >>>>> @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct >>>>> hfs_btree *tree, u32 cnid) >>>>> page_cache_release(page); >>>>> goto fail; >>>>> } >>>>> -#if !REF_PAGES >>>>> page_cache_release(page); >>>>> -#endif >>>>> node->page[i] = page; >>>>> } >>>>> >>>>> @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node) >>>>> { >>>>> if (node) { >>>>> atomic_inc(&node->refcnt); >>>>> -#if REF_PAGES >>>>> - { >>>>> - int i; >>>>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>>>> - get_page(node->page[i]); >>>>> - } >>>>> -#endif >>>>> dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n", >>>>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>>>> } >>>>> @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node) >>>>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>>>> if (!atomic_read(&node->refcnt)) >>>>> BUG(); >>>>> - if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) { >>>>> -#if REF_PAGES >>>>> - for (i = 0; i < tree->pages_per_bnode; i++) >>>>> - put_page(node->page[i]); >>>>> -#endif >>>>> + if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) >>>>> return; >>>>> - } >>>>> for (i = 0; i < tree->pages_per_bnode; i++) { >>>>> if (!node->page[i]) >>>>> continue; >>>>> mark_page_accessed(node->page[i]); >>>>> -#if REF_PAGES >>>>> - put_page(node->page[i]); >>>>> -#endif >>>>> } >>>>> >>>>> if (test_bit(HFS_BNODE_DELETED, &node->flags)) { >>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c >>>>> index 8868d3b..b85abc6 100644 >>>>> --- a/fs/hfsplus/bnode.c >>>>> +++ b/fs/hfsplus/bnode.c >>>>> @@ -18,8 +18,6 @@ >>>>> #include "hfsplus_fs.h" >>>>> #include "hfsplus_raw.h" >>>>> >>>>> -#define REF_PAGES 0 >>>>> - >>>>> /* Copy a specified range of bytes from the raw data of a node */ >>>>> void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len) >>>>> { >>>>> @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct >>>>> hfs_btree *tree, u32 cnid) >>>>> page_cache_release(page); >>>>> goto fail; >>>>> } >>>>> -#if !REF_PAGES >>>>> page_cache_release(page); >>>>> -#endif >>>>> node->page[i] = page; >>>>> } >>>>> >>>>> @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node) >>>>> { >>>>> if (node) { >>>>> atomic_inc(&node->refcnt); >>>>> -#if REF_PAGES >>>>> - { >>>>> - int i; >>>>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>>>> - get_page(node->page[i]); >>>>> - } >>>>> -#endif >>>>> dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n", >>>>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>>>> } >>>>> @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node) >>>>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>>>> if (!atomic_read(&node->refcnt)) >>>>> BUG(); >>>>> - if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) { >>>>> -#if REF_PAGES >>>>> - for (i = 0; i < tree->pages_per_bnode; i++) >>>>> - put_page(node->page[i]); >>>>> -#endif >>>>> + if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) >>>>> return; >>>>> - } >>>>> for (i = 0; i < tree->pages_per_bnode; i++) { >>>>> if (!node->page[i]) >>>>> continue; >>>>> mark_page_accessed(node->page[i]); >>>>> -#if REF_PAGES >>>>> - put_page(node->page[i]); >>>>> -#endif >>>>> } >>>>> >>>>> if (test_bit(HFS_BNODE_DELETED, &node->flags)) { >>>>> </commit> >>>>> >>>>> On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx> wrote: >>>>>> ------------------------------ >>>>>> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote: >>>>>> >>>>>>>Hi Andrew, >>>>>>> >>>>>>>Can you please take this patch up and get it merged into mainline? Despite Vyacheslav's lamentations this patch is obviously correct. __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed. >>>>>>> >>>>>>>Feel free to add: >>>>>>> >>>>>>>Reviewed-by: Anton Altaparmakov <anton@xxxxxxxxxx> >>>>>>> >>>>>>>Thanks a lot in advance! >>>>>>> >>>>>>>Best regards, >>>>>>> >>>>>>> Anton >>>>>> >>>>>> I went around and looked at the code and Anton is right - __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong. >>>>>> >>>>>> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point. >>>>>> >>>>>> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead. >>>>>> I'm looking over the pre-git history to see how that comes about. >>>>>> >>>>>> Hin-Tak >>>>>> >>>>>>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@xxxxxxxxx> wrote: >>>>>>>> >>>>>>>> Fix this bugreport by Sasha Levin: >>>>>>>> http://lkml.org/lkml/2015/2/20/85 ("use after free") >>>>>>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode. >>>>>>>> >>>>>>>> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx> >>>>>>>> Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx> >>>>>>>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> >>>>>>>> Cc: Vyacheslav Dubeyko <slava@xxxxxxxxxxx> >>>>>>>> Cc: Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx> >>>>>>>> Cc: Sougata Santra <sougata@xxxxxxxxxx> >>>>>>>> Reported-by: Sasha Levin <sasha.levin@xxxxxxxxxx> >>>>>>>> Signed-off-by: Sergei Antonov <saproj@xxxxxxxxx> >>>>>>>> --- >>>>>>>> fs/hfsplus/bnode.c | 6 ++---- >>>>>>>> 1 file changed, 2 insertions(+), 4 deletions(-) >>>>>>>> >>>>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c >>>>>>>> index 759708f..5af50fb 100644 >>>>>>>> --- a/fs/hfsplus/bnode.c >>>>>>>> +++ b/fs/hfsplus/bnode.c >>>>>>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid) >>>>>>>> page_cache_release(page); >>>>>>>> goto fail; >>>>>>>> } >>>>>>>> - page_cache_release(page); >>>>>>>> node->page[i] = page; >>>>>>>> } >>>>>>>> >>>>>>>> @@ -566,13 +565,12 @@ node_error: >>>>>>>> >>>>>>>> void hfs_bnode_free(struct hfs_bnode *node) >>>>>>>> { >>>>>>>> -#if 0 >>>>>>>> int i; >>>>>>>> >>>>>>>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>>>>>>> + for (i = 0; i < node->tree->pages_per_bnode; i++) { >>>>>>>> if (node->page[i]) >>>>>>>> page_cache_release(node->page[i]); >>>>>>>> -#endif >>>>>>>> + } >>>>>>>> kfree(node); >>>>>>>> } >>>>>>>> >>>>>>>> -- >>>>>>>> 2.3.0 >>>>>>>> >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in >>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> >>>>>>>-- >>>>>>>Anton Altaparmakov <anton at tuxera.com> (replace at with @) >>>>>>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/ >>>>>>>Linux NTFS maintainer >>>>>>> >>>>>> -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html