On Mon, 6 Dec 2010, Michael Leun wrote: > At the moment I'm trying to create an easy to reproduce scenario. > I've managed to reproduce the BUG. First I thought it has to do with fork() racing with invalidate_inode_pages2_range() but it turns out, just two parallel invocation of invalidate_inode_pages2_range() with some page faults going on can trigger it. The problem is: unmap_mapping_range() is not prepared for more than one concurrent invocation per inode. For example: thread1: going through a big range, stops in the middle of a vma and stores the restart address in vm_truncate_count. thread2: comes in with a small (e.g. single page) unmap request on the same vma, somewhere before restart_address, finds that the vma was already unmapped up to the restart address and happily returns without doing anything. Another scenario would be two big unmap requests, both having to restart the unmapping and each one setting vm_truncate_count to its own value. This could go on forever without any of them being able to finish. Truncate and hole punching already serialize with i_mutex. Other callers of unmap_mapping_range() do not, however, and I see difficulty with doing it in the callers. I think the proper solution is to add serialization to unmap_mapping_range() itself. Attached patch attempts to do this without adding more fields to struct address_space. It fixes the bug in my testing. Comments? Thanks, Miklos --- include/linux/pagemap.h | 1 + mm/memory.c | 14 ++++++++++++++ 2 files changed, 15 insertions(+) Index: linux.git/include/linux/pagemap.h =================================================================== --- linux.git.orig/include/linux/pagemap.h 2010-11-26 10:52:17.000000000 +0100 +++ linux.git/include/linux/pagemap.h 2010-12-11 13:39:32.000000000 +0100 @@ -24,6 +24,7 @@ enum mapping_flags { AS_ENOSPC = __GFP_BITS_SHIFT + 1, /* ENOSPC on async write */ AS_MM_ALL_LOCKS = __GFP_BITS_SHIFT + 2, /* under mm_take_all_locks() */ AS_UNEVICTABLE = __GFP_BITS_SHIFT + 3, /* e.g., ramdisk, SHM_LOCK */ + AS_UNMAPPING = __GFP_BITS_SHIFT + 4, /* for unmap_mapping_range() */ }; static inline void mapping_set_error(struct address_space *mapping, int error) Index: linux.git/mm/memory.c =================================================================== --- linux.git.orig/mm/memory.c 2010-12-11 13:07:28.000000000 +0100 +++ linux.git/mm/memory.c 2010-12-11 14:09:42.000000000 +0100 @@ -2535,6 +2535,12 @@ static inline void unmap_mapping_range_l } } +static int mapping_sleep(void *x) +{ + schedule(); + return 0; +} + /** * unmap_mapping_range - unmap the portion of all mmaps in the specified address_space corresponding to the specified page range in the underlying file. * @mapping: the address space containing mmaps to be unmapped. @@ -2572,6 +2578,9 @@ void unmap_mapping_range(struct address_ details.last_index = ULONG_MAX; details.i_mmap_lock = &mapping->i_mmap_lock; + wait_on_bit_lock(&mapping->flags, AS_UNMAPPING, mapping_sleep, + TASK_UNINTERRUPTIBLE); + spin_lock(&mapping->i_mmap_lock); /* Protect against endless unmapping loops */ @@ -2588,6 +2597,11 @@ void unmap_mapping_range(struct address_ if (unlikely(!list_empty(&mapping->i_mmap_nonlinear))) unmap_mapping_range_list(&mapping->i_mmap_nonlinear, &details); spin_unlock(&mapping->i_mmap_lock); + + clear_bit_unlock(AS_UNMAPPING, &mapping->flags); + smp_mb__after_clear_bit(); + wake_up_bit(&mapping->flags, AS_UNMAPPING); + } EXPORT_SYMBOL(unmap_mapping_range); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>