v1 -> v2: * Explained what the bug leads to. * A list of related bug reports was expanded. * More specific technical explanation of the change. Fix the bug which manifests itself in either of two ways. 1. A crash in hfsplus_bnode_read()/hfsplus_brec_lenoff() at an attempt to read unavailable memory. 2. Dmesg errors "recoff %d too large\n" and "keylen %d too large\n" followed by inaccessibility of some or all directories on the mounted hfsplus volume. Example (by Luc Pionchon): $ ls foo/ ls: reading directory foo/: Invalid argument These problems have been reported multiple times: * https://lkml.org/lkml/2015/2/20/85 * https://lkml.org/lkml/2014/10/11/229 * https://bugzilla.kernel.org/show_bug.cgi?id=42342 * https://bugzilla.kernel.org/show_bug.cgi?id=63841 * https://bugzilla.kernel.org/show_bug.cgi?id=78761 * https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814 * https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887 * https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1332950 * http://comments.gmane.org/gmane.linux.file-systems/73200 * A "du"-related issue mentioned by Hin-Tak Leung on linux-fsdevel@ several times. The following quotation summarizes it. >>> an issue I have had which is quite reproducible: >>> If I merely run "du" repeatedly on a large directory on an HFS+ >>> formatted drive, on a somewhat resource-tight system (having firefox >>> running with lots windows seems to make it happens sooner, and it was >>> on a system with only 2GB memory). >> >>Then what? > > The kernel getting very confused, not being able to read some > files/directories on the HFS+ volume, and having lots of messages in > the kernel log about inconsistent b-tree. > > This appears to be entirely in-memory and has no effect on on-disk data; > also I can reproduce the problem by mounting read-only. If I reboot, > fsck always says the hfs+ volume is clean, and I can repeat the exercise. The bug happened because __hfs_bnode_create() called page_cache_release() right after calling read_mapping_page(). Later the read and immediately released "struct page*" was passed to kmap() and the memory page was read, which worked most of the time under normal circumstances. But under tight memory conditions, by the time the page was kmap()-ped and read, it was either unavailable (see manifestation #1 above) or contained wrong data (see manifestation #2). The patch removes the call to page_cache_release() from __hfs_bnode_create() and adds it to hfs_bnode_free(). Thus making sure mapped pages are available for the entire lifetime of hfs_bnode. In other words, the broken logic of working with b-nodes: "read_mapping_page(), page_cache_release(), kmap(), kunmap()" was changed to the correct one: "read_mapping_page(), kmap(), kunmap(), page_cache_release()". In yet other words, release bnode pages after use, not the other way round. Cc: stable@xxxxxxxxxxxxxxx Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx> Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Cc: Vyacheslav Dubeyko <slava@xxxxxxxxxxx> Cc: Sougata Santra <sougata@xxxxxxxxxx> Reviewed-by: Anton Altaparmakov <anton@xxxxxxxxxx> Signed-off-by: Sergei Antonov <saproj@xxxxxxxxx> Reported-by: Tomi Pieviläinen <tpievila@xxxxxxxxx> Reported-by: Chris Murphy <bugzilla@xxxxxxxxxxxxxxxxx> Reported-by: David Leder <david.leder@xxxxxxxxx> Reported-by: Michael Fox <415fox@xxxxxxxxx> Reported-by: Sasha Levin <sasha.levin@xxxxxxxxxx> Reported-by: Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx> Reported-by: Luc Pionchon <pionchon.luc@xxxxxxxxx> --- fs/hfsplus/bnode.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c index 759708f..5af50fb 100644 --- a/fs/hfsplus/bnode.c +++ b/fs/hfsplus/bnode.c @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid) page_cache_release(page); goto fail; } - page_cache_release(page); node->page[i] = page; } @@ -566,13 +565,12 @@ node_error: void hfs_bnode_free(struct hfs_bnode *node) { -#if 0 int i; - for (i = 0; i < node->tree->pages_per_bnode; i++) + for (i = 0; i < node->tree->pages_per_bnode; i++) { if (node->page[i]) page_cache_release(node->page[i]); -#endif + } kfree(node); } -- 2.3.0 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html