On 18 June 2015 at 21:51, Sergei Antonov <saproj@xxxxxxxxx> wrote: > On 18 June 2015 at 19:16, Hin-Tak Leung <hintak.leung@xxxxxxxxx> wrote: >> On 18 June 2015 at 17:19, Sergei Antonov <saproj@xxxxxxxxx> wrote: >>> On 18 June 2015 at 17:37, Hin-Tak Leung <hintak.leung@xxxxxxxxx> wrote: >>>> On 18 June 2015 at 13:58, Sergei Antonov <saproj@xxxxxxxxx> wrote: >>>>> On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@xxxxxxxxx> wrote: >>>>>> Hi Andrew and everybody else, >>>>>> >>>>>> I looked through the pre-git history and seem to have found the reason of how >>>>>> the current code had come to be, and why Sergei's fix seems to involve >>>>>> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry, >>>>>> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old >>>>>> debug code" >>>>>> commit below did not remove the right stuff. >>>>>> >>>>>> Looking at the old code, I think REF_PAGES may have started out being >>>>>> 1 (i.e. pages are released right after create, then get/put when needed, no >>>>>> need to release on bnode_free), >>>>>> then the code was modified for efficiency so that pages are not released >>>>>> on bnode_create and not put/get at bnode_put/get but release at bnode_free. >>>>>> But it was still largely working in the REF_PAGE=1 mode (and with the commented >>>>>> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE). >>>>>> Then it was flipped over, and seems to work, and things were forgotten. >>>>>> >>>>>> I think the release at bnode_free was commented out because the person who >>>>>> wrote it wasn't sure - I am not sure either, since it seems like there might be >>>>>> other/more places that one might need to release the pages than just >>>>>> at bnode_free(). >>>>>> >>>>>> Does this train of thought make sense? >>>>> >>>>> No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well, >>>>> at first it was my thought too. But later I considered what the logic >>>>> of the author could have been with REF_PAGES=0 and REF_PAGES=1 and >>>>> came to a conclusion that REF_PAGES=1 was a safe approach and >>>>> REF_PAGES=0 was an experimental unsafe release-early approach, so >>>>> !REF_PAGES was what he really meant. >>>>> >>>> >>>> You are wrong, it seems. I found even earlier versions of the code, >>>> with history, at >>>> http://sourceforge.net/projects/linux-hfsplus/ . >>>> The earlier version of the code >>>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/) >>>> does get_page and put_page at bnode_get and bnode_put, while the later version >>>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/ >>>> ) >>>> have those removed, dated 2001 december and 2002 june. >>> >>> Interesting. I only looked at historical linux git. >>> >>>> The change log says so: >>>> >>>> Date: Thu Jun 6 09:45:14 2002 +0000 >>>> >>>> Use buffer cache instead of page cache in bnode.c. Cache inode extents. >>>> >>>> The change also removed any page_cache_release() . It would seem that >>>> the additional REF_PAGES >>>> conditional was a fork/rewrite/clean-up which re-uses the earlier >>>> development logic, and eventually took over. >>>> >>>> The release-early code came first, not after. That makes sense, as >>>> releasing early >>> >>> We may be talking about different things. I did not write what code >>> came earlier and what after. Anyway, this becomes to difficult to >>> follow. My point was that in the variant without refs, calling >>> page_cache_release() right after to read_cache_page() was exactly what >>> the author wanted (although it is a bug). >>> >>> Again, we are probably arguing about different things. And I'm a >>> little less zealous than you to do this code archeology. >>> >> >> I would really prefer that you don't use personal descriptions such as "zealous" >> as well as making the kind of comments such as >> "The result of the test (which I'm sure will show the bug is gone) will >> be by far more valuable contribution than your other messages in the >> thread.". Those are not helpful. >> >> In my first reply I have already stated my reasoning: your fix consists >> mainly of removing an "if 0" and re-enabling seemingly dead code. So the >> dead code was written for a purpose which is now lost and forgotten. > > Of course. And it is not hard to guess that purpose quite reliably :). > The code used to release pages correctly, i. e. when freeing the bnode > structure. > >> And here is that purpose - early development of the code was done >> with getting and putting pages within each routine, for safeness, and migrating >> towards caching pages for a longer life cycle in 2001-2002. When the >> code was cleaned >> up and re-worked before putting into the kernel in 2005, this >> development process >> was repeated but with a REF_PAGES conditional and some additional code >> which was // commented out. >> >> The migration to caching pages >> would have been more or less correct, if REF_PAGES was flipped >> and the // commented out code was enabled. Except there was a mistake >> in one of the conditionals > > By mistake you mean !REF_PAGES in __hfs_bnode_create()? But it is not > a mistake under the conditions you provided above. With REF_PAGES > flipped to 1 and the // piece uncommented you don't want to release in > __hfs_bnode_create() because releasing is done in hfs_bnode_free(). > Yes, that's what I am saying. I believe the development of the code happens as follows: - bnode_get() does get_page(), bnode_put() does put_page(), and the rest of the routines - i.e. bnode_create() look up the pages but immediately release them. bnode_free() does nothing about pages. - then it was desired to keep the pages longer without the repeated get_pages and put_pages. so those are "#if REF_PAGES" out. And we no longer get and put pages on demand and should also stop releasing them after looking up at bnode_create() also. (the bnode_create() is a poor name for what it does, but never mind). New code is added to release at bnode_free(). The new code should have been spanned by the opposite, i.e. "#if !REF_PAGES", but was inserted as place holder with //. The rest of the story I have already tried to explain a couple of times. The switch from REF_PAGES=1 to =0 (i.e. from get_pages/put_pages per bnode_get/bnode_put routines, to keeping pages around longer between bnode_create/bnode_free) did not quite work correctly. For the others who want to see the version of the code with those REF_PAGES and want to weigh in on this, I cannot find a web interface for the repo, but it is the commit below in git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git commit 91556682e0bf004d98a529bf829d339abb98bbbd Author: Andrew Morton <akpm@xxxxxxxx> Date: Wed Feb 25 16:17:48 2004 -0800 [PATCH] HFS+ support From: Roman Zippel <zippel@xxxxxxxxxxxxxx> This driver adds full read/write support for HFS+ and is based on the readonly driver by Brad Broyer. Thanks to Ethan Benson <erbenson@xxxxxxxxxx> for a number of patches to make the driver more compliant with the spec. I would also repeat that earlier code and change history was at http://sourceforge.net/projects/linux-hfsplus/, and it showed the changes between 2001 dec to 2002 june from get_page/put_page per bnode_get/bnode_put to removing those and switching to a longer page life-cycle with the log message: "Use buffer cache instead of page cache in bnode.c. Cache inode extents." >>, and also the commented out code never got >> enabled. The commented out code was where a !REF_PAGES supposed to be. >> the rest should all be all plain "if REF_PAGES". The singular "if !REF_PAGES" >> was a mistake that has propagated. >> >> >>>> and read_cache_page on each bnode_get is the slower, safer but >>>> inefficient approach. >>>> It would be absurd to go from caching pages, then to 'experiment' to >>>> see if release-early + later read_cache_page >>>> on each bnode_get 'improves'. >>>> >>>>>> >>>>>> Hin-Tak >>>>>> >>>>>> <commit> >>>>>> commit a5e3985fa014029eb6795664c704953720cc7f7d >>>>>> Author: Roman Zippel <zippel@xxxxxxxxxxxxxx> >>>>>> Date: Tue Sep 6 15:18:47 2005 -0700 >>>>>> >>>>>> [PATCH] hfs: remove debug code >>>>>> >>>>>> This removes some old debug code, which is no longer needed. >>>>>> >>>>>> Signed-off-by: Roman Zippel <zippel@xxxxxxxxxxxxxx> >>>>>> Signed-off-by: Andrew Morton <akpm@xxxxxxxx> >>>>>> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxx> >>>>>> >>>>>> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c >>>>>> index a096c5a..3d5cdc6 100644 >>>>>> --- a/fs/hfs/bnode.c >>>>>> +++ b/fs/hfs/bnode.c >>>>>> @@ -13,8 +13,6 @@ >>>>>> >>>>>> #include "btree.h" >>>>>> >>>>>> -#define REF_PAGES 0 >>>>>> - >>>>>> void hfs_bnode_read(struct hfs_bnode *node, void *buf, >>>>>> int off, int len) >>>>>> { >>>>>> @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct >>>>>> hfs_btree *tree, u32 cnid) >>>>>> page_cache_release(page); >>>>>> goto fail; >>>>>> } >>>>>> -#if !REF_PAGES >>>>>> page_cache_release(page); >>>>>> -#endif >>>>>> node->page[i] = page; >>>>>> } >>>>>> >>>>>> @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node) >>>>>> { >>>>>> if (node) { >>>>>> atomic_inc(&node->refcnt); >>>>>> -#if REF_PAGES >>>>>> - { >>>>>> - int i; >>>>>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>>>>> - get_page(node->page[i]); >>>>>> - } >>>>>> -#endif >>>>>> dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n", >>>>>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>>>>> } >>>>>> @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node) >>>>>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>>>>> if (!atomic_read(&node->refcnt)) >>>>>> BUG(); >>>>>> - if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) { >>>>>> -#if REF_PAGES >>>>>> - for (i = 0; i < tree->pages_per_bnode; i++) >>>>>> - put_page(node->page[i]); >>>>>> -#endif >>>>>> + if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) >>>>>> return; >>>>>> - } >>>>>> for (i = 0; i < tree->pages_per_bnode; i++) { >>>>>> if (!node->page[i]) >>>>>> continue; >>>>>> mark_page_accessed(node->page[i]); >>>>>> -#if REF_PAGES >>>>>> - put_page(node->page[i]); >>>>>> -#endif >>>>>> } >>>>>> >>>>>> if (test_bit(HFS_BNODE_DELETED, &node->flags)) { >>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c >>>>>> index 8868d3b..b85abc6 100644 >>>>>> --- a/fs/hfsplus/bnode.c >>>>>> +++ b/fs/hfsplus/bnode.c >>>>>> @@ -18,8 +18,6 @@ >>>>>> #include "hfsplus_fs.h" >>>>>> #include "hfsplus_raw.h" >>>>>> >>>>>> -#define REF_PAGES 0 >>>>>> - >>>>>> /* Copy a specified range of bytes from the raw data of a node */ >>>>>> void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len) >>>>>> { >>>>>> @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct >>>>>> hfs_btree *tree, u32 cnid) >>>>>> page_cache_release(page); >>>>>> goto fail; >>>>>> } >>>>>> -#if !REF_PAGES >>>>>> page_cache_release(page); >>>>>> -#endif >>>>>> node->page[i] = page; >>>>>> } >>>>>> >>>>>> @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node) >>>>>> { >>>>>> if (node) { >>>>>> atomic_inc(&node->refcnt); >>>>>> -#if REF_PAGES >>>>>> - { >>>>>> - int i; >>>>>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>>>>> - get_page(node->page[i]); >>>>>> - } >>>>>> -#endif >>>>>> dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n", >>>>>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>>>>> } >>>>>> @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node) >>>>>> node->tree->cnid, node->this, atomic_read(&node->refcnt)); >>>>>> if (!atomic_read(&node->refcnt)) >>>>>> BUG(); >>>>>> - if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) { >>>>>> -#if REF_PAGES >>>>>> - for (i = 0; i < tree->pages_per_bnode; i++) >>>>>> - put_page(node->page[i]); >>>>>> -#endif >>>>>> + if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) >>>>>> return; >>>>>> - } >>>>>> for (i = 0; i < tree->pages_per_bnode; i++) { >>>>>> if (!node->page[i]) >>>>>> continue; >>>>>> mark_page_accessed(node->page[i]); >>>>>> -#if REF_PAGES >>>>>> - put_page(node->page[i]); >>>>>> -#endif >>>>>> } >>>>>> >>>>>> if (test_bit(HFS_BNODE_DELETED, &node->flags)) { >>>>>> </commit> >>>>>> >>>>>> On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx> wrote: >>>>>>> ------------------------------ >>>>>>> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote: >>>>>>> >>>>>>>>Hi Andrew, >>>>>>>> >>>>>>>>Can you please take this patch up and get it merged into mainline? Despite Vyacheslav's lamentations this patch is obviously correct. __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed. >>>>>>>> >>>>>>>>Feel free to add: >>>>>>>> >>>>>>>>Reviewed-by: Anton Altaparmakov <anton@xxxxxxxxxx> >>>>>>>> >>>>>>>>Thanks a lot in advance! >>>>>>>> >>>>>>>>Best regards, >>>>>>>> >>>>>>>> Anton >>>>>>> >>>>>>> I went around and looked at the code and Anton is right - __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong. >>>>>>> >>>>>>> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point. >>>>>>> >>>>>>> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead. >>>>>>> I'm looking over the pre-git history to see how that comes about. >>>>>>> >>>>>>> Hin-Tak >>>>>>> >>>>>>>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@xxxxxxxxx> wrote: >>>>>>>>> >>>>>>>>> Fix this bugreport by Sasha Levin: >>>>>>>>> http://lkml.org/lkml/2015/2/20/85 ("use after free") >>>>>>>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode. >>>>>>>>> >>>>>>>>> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx> >>>>>>>>> Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx> >>>>>>>>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> >>>>>>>>> Cc: Vyacheslav Dubeyko <slava@xxxxxxxxxxx> >>>>>>>>> Cc: Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx> >>>>>>>>> Cc: Sougata Santra <sougata@xxxxxxxxxx> >>>>>>>>> Reported-by: Sasha Levin <sasha.levin@xxxxxxxxxx> >>>>>>>>> Signed-off-by: Sergei Antonov <saproj@xxxxxxxxx> >>>>>>>>> --- >>>>>>>>> fs/hfsplus/bnode.c | 6 ++---- >>>>>>>>> 1 file changed, 2 insertions(+), 4 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c >>>>>>>>> index 759708f..5af50fb 100644 >>>>>>>>> --- a/fs/hfsplus/bnode.c >>>>>>>>> +++ b/fs/hfsplus/bnode.c >>>>>>>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid) >>>>>>>>> page_cache_release(page); >>>>>>>>> goto fail; >>>>>>>>> } >>>>>>>>> - page_cache_release(page); >>>>>>>>> node->page[i] = page; >>>>>>>>> } >>>>>>>>> >>>>>>>>> @@ -566,13 +565,12 @@ node_error: >>>>>>>>> >>>>>>>>> void hfs_bnode_free(struct hfs_bnode *node) >>>>>>>>> { >>>>>>>>> -#if 0 >>>>>>>>> int i; >>>>>>>>> >>>>>>>>> - for (i = 0; i < node->tree->pages_per_bnode; i++) >>>>>>>>> + for (i = 0; i < node->tree->pages_per_bnode; i++) { >>>>>>>>> if (node->page[i]) >>>>>>>>> page_cache_release(node->page[i]); >>>>>>>>> -#endif >>>>>>>>> + } >>>>>>>>> kfree(node); >>>>>>>>> } >>>>>>>>> >>>>>>>>> -- >>>>>>>>> 2.3.0 >>>>>>>>> >>>>>>>>> -- >>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in >>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>>>-- >>>>>>>>Anton Altaparmakov <anton at tuxera.com> (replace at with @) >>>>>>>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/ >>>>>>>>Linux NTFS maintainer >>>>>>>> >>>>>>> -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html