This may explain some of the userspace issues we've been seeing. ----- Forwarded message from Hugh Dickins <hugh@xxxxxxxxxxx> ----- Sender: linux-arch-owner@xxxxxxxxxxxxxxx From: Hugh Dickins <hugh@xxxxxxxxxxx> Subject: [PATCH] fix mapping_writably_mapped() To: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Lee Schermerhorn <Lee.Schermerhorn@xxxxxx>, linux-mm@xxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, linux-arch@xxxxxxxxxxxxxxx, stable@xxxxxxxxxx Date: Wed, 10 Dec 2008 20:48:52 +0000 (GMT) Message-ID: <Pine.LNX.4.64.0812102043060.25282@xxxxxxxxxxxxx> Lee Schermerhorn noticed yesterday that I broke the mapping_writably_mapped test in 2.6.7! Bad bad bug, good good find. The i_mmap_writable count must be incremented for VM_SHARED (just as i_writecount is for VM_DENYWRITE, but while holding the i_mmap_lock) when dup_mmap() copies the vma for fork: it has its own more optimal version of __vma_link_file(), and I missed this out. So the count was later going down to 0 (dangerous) when one end unmapped, then wrapping negative (inefficient) when the other end unmapped. The only impact on x86 would have been that setting a mandatory lock on a file which has at some time been opened O_RDWR and mapped MAP_SHARED (but not necessarily PROT_WRITE) across a fork, might fail with -EAGAIN when it should succeed, or succeed when it should fail. But those architectures which rely on flush_dcache_page() to flush userspace modifications back into the page before the kernel reads it, may in some cases have skipped the flush after such a fork - though any repetitive test will soon wrap the count negative, in which case it will flush_dcache_page() unnecessarily. Fix would be a two-liner, but mapping variable added, and comment moved. Reported-by: Lee Schermerhorn <Lee.Schermerhorn@xxxxxx> Signed-off-by: Hugh Dickins <hugh@xxxxxxxxxxx> --- kernel/fork.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) --- 2.6.28-rc7/kernel/fork.c 2008-11-15 23:09:30.000000000 +0000 +++ linux/kernel/fork.c 2008-12-10 12:49:13.000000000 +0000 @@ -315,17 +315,20 @@ static int dup_mmap(struct mm_struct *mm file = tmp->vm_file; if (file) { struct inode *inode = file->f_path.dentry->d_inode; + struct address_space *mapping = file->f_mapping; + get_file(file); if (tmp->vm_flags & VM_DENYWRITE) atomic_dec(&inode->i_writecount); - - /* insert tmp into the share list, just after mpnt */ - spin_lock(&file->f_mapping->i_mmap_lock); + spin_lock(&mapping->i_mmap_lock); + if (tmp->vm_flags & VM_SHARED) + mapping->i_mmap_writable++; tmp->vm_truncate_count = mpnt->vm_truncate_count; - flush_dcache_mmap_lock(file->f_mapping); + flush_dcache_mmap_lock(mapping); + /* insert tmp into the share list, just after mpnt */ vma_prio_tree_add(tmp, mpnt); - flush_dcache_mmap_unlock(file->f_mapping); - spin_unlock(&file->f_mapping->i_mmap_lock); + flush_dcache_mmap_unlock(mapping); + spin_unlock(&mapping->i_mmap_lock); } /* -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html ----- End forwarded message ----- -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html