Hi Andrew, > On 10 Jun 2015, at 01:15, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > On Mon, 8 Jun 2015 18:50:00 +0200 Sergei Antonov <saproj@xxxxxxxxx> wrote: >>>> You are basically saying you don___t understand it. Too bad, because the >>>> bug is very simple. It is the ___use after free___ type of bug, and it can >>>> be illustrated by this: >>>> (1) void *ptr = malloc(___); >>>> (2) free(ptr); >>>> (3) memcpy(___, ptr, 1); >>>> Guess which two of these three lines are executed in wrong order. >>>> >>>> My patch is about the same type of bug, but with memory pages mapping. >>>> The driver currently accesses pages that may be unavailable, or >>>> contain different data. The problem is more likely to occur when >>>> memory is a limited resource. I reproduced it while running a >>>> memory-hungry program. >>> >>> I worried not about myself but about potential readers of description of >>> the fix. The description is completely obscure. And it needs to describe >>> the fix in clear and descriptive manner. This is my request. Please, >>> describe the fix in a clear way. >> >> The description is just right. > > Yes, I too would like to hear much more about your thinking on this, > and a detailed description of the bug and how the patch fixes it. > > > The code is distressingly undocumented and has been that way since > Roman Zippel's original commit in 2004. > > From the looks of it, that loop in __hfs_bnode_create() is simply doing > readahead and is designed as a performance optimisation. The pages are > pulled into pagecache in the expectation that they will soon be > accessed. What your patch does is to instead pin the pages in > pagecache until the bnode is freed. If we're going to do this then we > need to be very careful about worst-case scenarios: we could even run > the machine out of memory. > > If I'm correct, and this is just readahead then the bug lies elsewhere: > if other parts of hfsplus are assuming that this memory is in pagecache > then that's an error - that code (where is it?) should instead be performing > a pagecache lookup and if the page isn't present, read it from disk > again. > > But for others to be able to review and understand this change and > suggest alternatives, we'll need a much much better changelog! Given I had just looked at the code... __hfs_bnode_create() is not doing read-ahead at all from my reading of the code. What it does is to gather the needed pages that are immediately processed, i.e. hfs_bnode_find() does: - call __hfs_bnode_create() which gathers the pages into the array of pages @node->page - kmap() some page from @node->page array, read some information from the kmapped page then kunmap it again (this is actually insane - either all the pages should have been kmapped in __hfs_bnode_create or it should be using kmap_atomic!). Here is the actual code: desc = (struct hfs_bnode_desc *)(kmap(node->page[0]) + node->page_offset); node->prev = be32_to_cpu(desc->prev); node->next = be32_to_cpu(desc->next); node->num_recs = be16_to_cpu(desc->num_recs); node->type = desc->type; node->height = desc->height; kunmap(node->page[0]); That actually makes some sense (though kmap_atomic would still be much better!) but then this follows: off = hfs_bnode_read_u16(node, rec_off); and (omitting lines for clarity): for (i = 1; i <= node->num_recs; off = next_off, i++) { ... next_off = hfs_bnode_read_u16(node, rec_off); ... key_size = hfs_bnode_read_u16(node, off) + 2; ... } Where all those hfs_bnode_read_u16() are simply "kmap, copy u16 from kmapped page, kunmap" so again crazy thing to be doing - either kmap_atomic or probably actually better just kmap all the pages to start with in @node->page array at the time of reading them all in... If you consider __hfs_bnode_create to be doing readahead then all those calls to hfs_bnode_read_u16() would have to do a read_mapping_page() and incur the overhead of a radix tree lookup for each u16 read. That would not be very good for performance/cpu usage. Also note there is no danger of running out of RAM as the largest allowed B-tree node size in HFS+ according to Apple's own specification is 32kiB, i.e. on 4k page size only 8 pages maximum - readpages functions deal with significantly more pages at a time without worrying about running out of RAM. But yes I agree with you that HFS+ is an undocumented mess but at least Sergei is putting some effort to make it better! Best regards, Anton -- Anton Altaparmakov <anton at tuxera.com> (replace at with @) Lead in File System Development, Tuxera Inc., http://www.tuxera.com/ Linux NTFS maintainer -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html