On 18 June 2015 at 17:19, Sergei Antonov <saproj@xxxxxxxxx> wrote: > On 18 June 2015 at 17:37, Hin-Tak Leung <hintak.leung@xxxxxxxxx> wrote: >> On 18 June 2015 at 13:58, Sergei Antonov <saproj@xxxxxxxxx> wrote: >>> On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@xxxxxxxxx> wrote: >>>> Hi Andrew and everybody else, >>>> >>>> I looked through the pre-git history and seem to have found the reason of how >>>> the current code had come to be, and why Sergei's fix seems to involve >>>> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry, >>>> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old >>>> debug code" >>>> commit below did not remove the right stuff. >>>> >>>> Looking at the old code, I think REF_PAGES may have started out being >>>> 1 (i.e. pages are released right after create, then get/put when needed, no >>>> need to release on bnode_free), >>>> then the code was modified for efficiency so that pages are not released >>>> on bnode_create and not put/get at bnode_put/get but release at bnode_free. >>>> But it was still largely working in the REF_PAGE=1 mode (and with the commented >>>> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE). >>>> Then it was flipped over, and seems to work, and things were forgotten. >>>> >>>> I think the release at bnode_free was commented out because the person who >>>> wrote it wasn't sure - I am not sure either, since it seems like there might be >>>> other/more places that one might need to release the pages than just >>>> at bnode_free(). >>>> >>>> Does this train of thought make sense? >>> >>> No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well, >>> at first it was my thought too. But later I considered what the logic >>> of the author could have been with REF_PAGES=0 and REF_PAGES=1 and >>> came to a conclusion that REF_PAGES=1 was a safe approach and >>> REF_PAGES=0 was an experimental unsafe release-early approach, so >>> !REF_PAGES was what he really meant. >>> >> >> You are wrong, it seems. I found even earlier versions of the code, >> with history, at >> http://sourceforge.net/projects/linux-hfsplus/ . >> The earlier version of the code >> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/) >> does get_page and put_page at bnode_get and bnode_put, while the later version >> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/ >> ) >> have those removed, dated 2001 december and 2002 june. > > Interesting. I only looked at historical linux git. > >> The change log says so: >> >> Date: Thu Jun 6 09:45:14 2002 +0000 >> >> Use buffer cache instead of page cache in bnode.c. Cache inode extents. >> >> The change also removed any page_cache_release() . It would seem that >> the additional REF_PAGES >> conditional was a fork/rewrite/clean-up which re-uses the earlier >> development logic, and eventually took over. >> >> The release-early code came first, not after. That makes sense, as >> releasing early > > We may be talking about different things. I did not write what code > came earlier and what after. Anyway, this becomes to difficult to > follow. My point was that in the variant without refs, calling > page_cache_release() right after to read_cache_page() was exactly what > the author wanted (although it is a bug). > > Again, we are probably arguing about different things. And I'm a > little less zealous than you to do this code archeology. > I would really prefer that you don't use personal descriptions such as "zealous" as well as making the kind of comments such as "The result of the test (which I'm sure will show the bug is gone) will be by far more valuable contribution than your other messages in the thread.". Those are not helpful. In my first reply I have already stated my reasoning: your fix consists mainly of removing an "if 0" and re-enabling seemingly dead code. So the dead code was written for a purpose which is now lost and forgotten. And here is that purpose - early development of the code was done with getting and putting pages within each routine, for safeness, and migrating towards caching pages for a longer life cycle in 2001-2002. When the code was cleaned up and re-worked before putting into the kernel in 2005, this development process was repeated but with a REF_PAGES conditional and some additional code which was // commented out. The migration to caching pages would have been more or less correct, if REF_PAGES was flipped and the // commented out code was enabled. Except there was a mistake in one of the conditionals, and also the commented out code never got enabled. The commented out code was where a !REF_PAGES supposed to be. the rest should all be all plain "if REF_PAGES". The singular "if !REF_PAGES" was a mistake that has propagated. >> and read_cache_page on each bnode_get is the slower, safer but >> inefficient approach. >> It would be absurd to go from caching pages, then to 'experiment' to >> see if release-early + later read_cache_page >> on each bnode_get 'improves'. >> >>>> >>>> Hin-Tak >>>> >>>> <commit> >>>> commit a5e3985fa014029eb6795664c704953720cc7f7d >>>> Author: Roman Zippel <zippel@xxxxxxxxxxxxxx> >>>> Date: Tue Sep 6 15:18:47 2005 -0700 >>>> >>>> [PATCH] hfs: remove debug code >>>> >>>> This removes some old debug code, which is no longer needed. >>>> >>>> Signed-off-by: Roman Zippel <zippel@xxxxxxxxxxxxxx> >>>> Signed-off-by: Andrew Morton <akpm@xxxxxxxx> >>>> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxx> >>>> >>>> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c >>>> index a096c5a..3d5cdc6 100644 >>>> --- a/fs/hfs/bnode.c >>>> +++ b/fs/hfs/bnode.c >>>> @@ -13,8 +13,6 @@ >>>> >>>> #include "btree.h" >>>> >>>> -#define REF_PAGES 0 >>>> - >>>> void hfs_bnode_read(struct hfs_bnode *node, void *buf, >>>> int off, int len) >>>> { >>>> @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct >>>> hfs_btree *tree, u32 cnid) >>>> page_cache_release(page); >>>> goto fail; >>>> } >>>> -#if !REF_PAGES >>>> page_cache_release(page); >>>> -#endif >>>> node->page[i] = page; >>>> } >>>> >>>> @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node) >>>> { >>>> if (node) { >>>> atomic_inc(&node->refcnt); >>>> -#if REF_PAGES >>>> - { >>>> - int i; >>>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>>> - get_page(node->page[i]); >>>> - } >>>> -#endif >>>> dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n", >>>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>>> } >>>> @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node) >>>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>>> if (!atomic_read(&node->refcnt)) >>>> BUG(); >>>> - if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) { >>>> -#if REF_PAGES >>>> - for (i = 0; i < tree->pages_per_bnode; i++) >>>> - put_page(node->page[i]); >>>> -#endif >>>> + if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) >>>> return; >>>> - } >>>> for (i = 0; i < tree->pages_per_bnode; i++) { >>>> if (!node->page[i]) >>>> continue; >>>> mark_page_accessed(node->page[i]); >>>> -#if REF_PAGES >>>> - put_page(node->page[i]); >>>> -#endif >>>> } >>>> >>>> if (test_bit(HFS_BNODE_DELETED, &node->flags)) { >>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c >>>> index 8868d3b..b85abc6 100644 >>>> --- a/fs/hfsplus/bnode.c >>>> +++ b/fs/hfsplus/bnode.c >>>> @@ -18,8 +18,6 @@ >>>> #include "hfsplus_fs.h" >>>> #include "hfsplus_raw.h" >>>> >>>> -#define REF_PAGES 0 >>>> - >>>> /* Copy a specified range of bytes from the raw data of a node */ >>>> void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len) >>>> { >>>> @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct >>>> hfs_btree *tree, u32 cnid) >>>> page_cache_release(page); >>>> goto fail; >>>> } >>>> -#if !REF_PAGES >>>> page_cache_release(page); >>>> -#endif >>>> node->page[i] = page; >>>> } >>>> >>>> @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node) >>>> { >>>> if (node) { >>>> atomic_inc(&node->refcnt); >>>> -#if REF_PAGES >>>> - { >>>> - int i; >>>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>>> - get_page(node->page[i]); >>>> - } >>>> -#endif >>>> dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n", >>>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>>> } >>>> @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node) >>>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>>> if (!atomic_read(&node->refcnt)) >>>> BUG(); >>>> - if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) { >>>> -#if REF_PAGES >>>> - for (i = 0; i < tree->pages_per_bnode; i++) >>>> - put_page(node->page[i]); >>>> -#endif >>>> + if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) >>>> return; >>>> - } >>>> for (i = 0; i < tree->pages_per_bnode; i++) { >>>> if (!node->page[i]) >>>> continue; >>>> mark_page_accessed(node->page[i]); >>>> -#if REF_PAGES >>>> - put_page(node->page[i]); >>>> -#endif >>>> } >>>> >>>> if (test_bit(HFS_BNODE_DELETED, &node->flags)) { >>>> </commit> >>>> >>>> On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx> wrote: >>>>> ------------------------------ >>>>> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote: >>>>> >>>>>>Hi Andrew, >>>>>> >>>>>>Can you please take this patch up and get it merged into mainline? Despite Vyacheslav's lamentations this patch is obviously correct. __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed. >>>>>> >>>>>>Feel free to add: >>>>>> >>>>>>Reviewed-by: Anton Altaparmakov <anton@xxxxxxxxxx> >>>>>> >>>>>>Thanks a lot in advance! >>>>>> >>>>>>Best regards, >>>>>> >>>>>> Anton >>>>> >>>>> I went around and looked at the code and Anton is right - __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong. >>>>> >>>>> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point. >>>>> >>>>> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead. >>>>> I'm looking over the pre-git history to see how that comes about. >>>>> >>>>> Hin-Tak >>>>> >>>>>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@xxxxxxxxx> wrote: >>>>>>> >>>>>>> Fix this bugreport by Sasha Levin: >>>>>>> http://lkml.org/lkml/2015/2/20/85 ("use after free") >>>>>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode. >>>>>>> >>>>>>> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx> >>>>>>> Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx> >>>>>>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> >>>>>>> Cc: Vyacheslav Dubeyko <slava@xxxxxxxxxxx> >>>>>>> Cc: Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx> >>>>>>> Cc: Sougata Santra <sougata@xxxxxxxxxx> >>>>>>> Reported-by: Sasha Levin <sasha.levin@xxxxxxxxxx> >>>>>>> Signed-off-by: Sergei Antonov <saproj@xxxxxxxxx> >>>>>>> --- >>>>>>> fs/hfsplus/bnode.c | 6 ++---- >>>>>>> 1 file changed, 2 insertions(+), 4 deletions(-) >>>>>>> >>>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c >>>>>>> index 759708f..5af50fb 100644 >>>>>>> --- a/fs/hfsplus/bnode.c >>>>>>> +++ b/fs/hfsplus/bnode.c >>>>>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid) >>>>>>> page_cache_release(page); >>>>>>> goto fail; >>>>>>> } >>>>>>> - page_cache_release(page); >>>>>>> node->page[i] = page; >>>>>>> } >>>>>>> >>>>>>> @@ -566,13 +565,12 @@ node_error: >>>>>>> >>>>>>> void hfs_bnode_free(struct hfs_bnode *node) >>>>>>> { >>>>>>> -#if 0 >>>>>>> int i; >>>>>>> >>>>>>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>>>>>> + for (i = 0; i < node->tree->pages_per_bnode; i++) { >>>>>>> if (node->page[i]) >>>>>>> page_cache_release(node->page[i]); >>>>>>> -#endif >>>>>>> + } >>>>>>> kfree(node); >>>>>>> } >>>>>>> >>>>>>> -- >>>>>>> 2.3.0 >>>>>>> >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in >>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>>-- >>>>>>Anton Altaparmakov <anton at tuxera.com> (replace at with @) >>>>>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/ >>>>>>Linux NTFS maintainer >>>>>> >>>>> -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html