On 18 June 2015 at 17:37, Hin-Tak Leung <hintak.leung@xxxxxxxxx> wrote: > On 18 June 2015 at 13:58, Sergei Antonov <saproj@xxxxxxxxx> wrote: >> On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@xxxxxxxxx> wrote: >>> Hi Andrew and everybody else, >>> >>> I looked through the pre-git history and seem to have found the reason of how >>> the current code had come to be, and why Sergei's fix seems to involve >>> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry, >>> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old >>> debug code" >>> commit below did not remove the right stuff. >>> >>> Looking at the old code, I think REF_PAGES may have started out being >>> 1 (i.e. pages are released right after create, then get/put when needed, no >>> need to release on bnode_free), >>> then the code was modified for efficiency so that pages are not released >>> on bnode_create and not put/get at bnode_put/get but release at bnode_free. >>> But it was still largely working in the REF_PAGE=1 mode (and with the commented >>> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE). >>> Then it was flipped over, and seems to work, and things were forgotten. >>> >>> I think the release at bnode_free was commented out because the person who >>> wrote it wasn't sure - I am not sure either, since it seems like there might be >>> other/more places that one might need to release the pages than just >>> at bnode_free(). >>> >>> Does this train of thought make sense? >> >> No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well, >> at first it was my thought too. But later I considered what the logic >> of the author could have been with REF_PAGES=0 and REF_PAGES=1 and >> came to a conclusion that REF_PAGES=1 was a safe approach and >> REF_PAGES=0 was an experimental unsafe release-early approach, so >> !REF_PAGES was what he really meant. >> > > You are wrong, it seems. I found even earlier versions of the code, > with history, at > http://sourceforge.net/projects/linux-hfsplus/ . > The earlier version of the code > (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/) > does get_page and put_page at bnode_get and bnode_put, while the later version > (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/ > ) > have those removed, dated 2001 december and 2002 june. Interesting. I only looked at historical linux git. > The change log says so: > > Date: Thu Jun 6 09:45:14 2002 +0000 > > Use buffer cache instead of page cache in bnode.c. Cache inode extents. > > The change also removed any page_cache_release() . It would seem that > the additional REF_PAGES > conditional was a fork/rewrite/clean-up which re-uses the earlier > development logic, and eventually took over. > > The release-early code came first, not after. That makes sense, as > releasing early We may be talking about different things. I did not write what code came earlier and what after. Anyway, this becomes to difficult to follow. My point was that in the variant without refs, calling page_cache_release() right after to read_cache_page() was exactly what the author wanted (although it is a bug). Again, we are probably arguing about different things. And I'm a little less zealous than you to do this code archeology. > and read_cache_page on each bnode_get is the slower, safer but > inefficient approach. > It would be absurd to go from caching pages, then to 'experiment' to > see if release-early + later read_cache_page > on each bnode_get 'improves'. > >>> >>> Hin-Tak >>> >>> <commit> >>> commit a5e3985fa014029eb6795664c704953720cc7f7d >>> Author: Roman Zippel <zippel@xxxxxxxxxxxxxx> >>> Date: Tue Sep 6 15:18:47 2005 -0700 >>> >>> [PATCH] hfs: remove debug code >>> >>> This removes some old debug code, which is no longer needed. >>> >>> Signed-off-by: Roman Zippel <zippel@xxxxxxxxxxxxxx> >>> Signed-off-by: Andrew Morton <akpm@xxxxxxxx> >>> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxx> >>> >>> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c >>> index a096c5a..3d5cdc6 100644 >>> --- a/fs/hfs/bnode.c >>> +++ b/fs/hfs/bnode.c >>> @@ -13,8 +13,6 @@ >>> >>> #include "btree.h" >>> >>> -#define REF_PAGES 0 >>> - >>> void hfs_bnode_read(struct hfs_bnode *node, void *buf, >>> int off, int len) >>> { >>> @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct >>> hfs_btree *tree, u32 cnid) >>> page_cache_release(page); >>> goto fail; >>> } >>> -#if !REF_PAGES >>> page_cache_release(page); >>> -#endif >>> node->page[i] = page; >>> } >>> >>> @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node) >>> { >>> if (node) { >>> atomic_inc(&node->refcnt); >>> -#if REF_PAGES >>> - { >>> - int i; >>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>> - get_page(node->page[i]); >>> - } >>> -#endif >>> dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n", >>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>> } >>> @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node) >>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>> if (!atomic_read(&node->refcnt)) >>> BUG(); >>> - if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) { >>> -#if REF_PAGES >>> - for (i = 0; i < tree->pages_per_bnode; i++) >>> - put_page(node->page[i]); >>> -#endif >>> + if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) >>> return; >>> - } >>> for (i = 0; i < tree->pages_per_bnode; i++) { >>> if (!node->page[i]) >>> continue; >>> mark_page_accessed(node->page[i]); >>> -#if REF_PAGES >>> - put_page(node->page[i]); >>> -#endif >>> } >>> >>> if (test_bit(HFS_BNODE_DELETED, &node->flags)) { >>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c >>> index 8868d3b..b85abc6 100644 >>> --- a/fs/hfsplus/bnode.c >>> +++ b/fs/hfsplus/bnode.c >>> @@ -18,8 +18,6 @@ >>> #include "hfsplus_fs.h" >>> #include "hfsplus_raw.h" >>> >>> -#define REF_PAGES 0 >>> - >>> /* Copy a specified range of bytes from the raw data of a node */ >>> void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len) >>> { >>> @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct >>> hfs_btree *tree, u32 cnid) >>> page_cache_release(page); >>> goto fail; >>> } >>> -#if !REF_PAGES >>> page_cache_release(page); >>> -#endif >>> node->page[i] = page; >>> } >>> >>> @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node) >>> { >>> if (node) { >>> atomic_inc(&node->refcnt); >>> -#if REF_PAGES >>> - { >>> - int i; >>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>> - get_page(node->page[i]); >>> - } >>> -#endif >>> dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n", >>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>> } >>> @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node) >>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>> if (!atomic_read(&node->refcnt)) >>> BUG(); >>> - if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) { >>> -#if REF_PAGES >>> - for (i = 0; i < tree->pages_per_bnode; i++) >>> - put_page(node->page[i]); >>> -#endif >>> + if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) >>> return; >>> - } >>> for (i = 0; i < tree->pages_per_bnode; i++) { >>> if (!node->page[i]) >>> continue; >>> mark_page_accessed(node->page[i]); >>> -#if REF_PAGES >>> - put_page(node->page[i]); >>> -#endif >>> } >>> >>> if (test_bit(HFS_BNODE_DELETED, &node->flags)) { >>> </commit> >>> >>> On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx> wrote: >>>> ------------------------------ >>>> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote: >>>> >>>>>Hi Andrew, >>>>> >>>>>Can you please take this patch up and get it merged into mainline? Despite Vyacheslav's lamentations this patch is obviously correct. __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed. >>>>> >>>>>Feel free to add: >>>>> >>>>>Reviewed-by: Anton Altaparmakov <anton@xxxxxxxxxx> >>>>> >>>>>Thanks a lot in advance! >>>>> >>>>>Best regards, >>>>> >>>>> Anton >>>> >>>> I went around and looked at the code and Anton is right - __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong. >>>> >>>> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point. >>>> >>>> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead. >>>> I'm looking over the pre-git history to see how that comes about. >>>> >>>> Hin-Tak >>>> >>>>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@xxxxxxxxx> wrote: >>>>>> >>>>>> Fix this bugreport by Sasha Levin: >>>>>> http://lkml.org/lkml/2015/2/20/85 ("use after free") >>>>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode. >>>>>> >>>>>> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx> >>>>>> Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx> >>>>>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> >>>>>> Cc: Vyacheslav Dubeyko <slava@xxxxxxxxxxx> >>>>>> Cc: Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx> >>>>>> Cc: Sougata Santra <sougata@xxxxxxxxxx> >>>>>> Reported-by: Sasha Levin <sasha.levin@xxxxxxxxxx> >>>>>> Signed-off-by: Sergei Antonov <saproj@xxxxxxxxx> >>>>>> --- >>>>>> fs/hfsplus/bnode.c | 6 ++---- >>>>>> 1 file changed, 2 insertions(+), 4 deletions(-) >>>>>> >>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c >>>>>> index 759708f..5af50fb 100644 >>>>>> --- a/fs/hfsplus/bnode.c >>>>>> +++ b/fs/hfsplus/bnode.c >>>>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid) >>>>>> page_cache_release(page); >>>>>> goto fail; >>>>>> } >>>>>> - page_cache_release(page); >>>>>> node->page[i] = page; >>>>>> } >>>>>> >>>>>> @@ -566,13 +565,12 @@ node_error: >>>>>> >>>>>> void hfs_bnode_free(struct hfs_bnode *node) >>>>>> { >>>>>> -#if 0 >>>>>> int i; >>>>>> >>>>>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>>>>> + for (i = 0; i < node->tree->pages_per_bnode; i++) { >>>>>> if (node->page[i]) >>>>>> page_cache_release(node->page[i]); >>>>>> -#endif >>>>>> + } >>>>>> kfree(node); >>>>>> } >>>>>> >>>>>> -- >>>>>> 2.3.0 >>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in >>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>>>-- >>>>>Anton Altaparmakov <anton at tuxera.com> (replace at with @) >>>>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/ >>>>>Linux NTFS maintainer >>>>> >>>> -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html