On Wed, Feb 19, 2020 at 09:28:15AM -0500, Qian Cai wrote: > struct vm_area_struct could be accessed concurrently as noticed by > KCSAN, > > write to 0xffff9cf8bba08ad8 of 8 bytes by task 14263 on cpu 35: > vma_interval_tree_insert+0x101/0x150: > rb_insert_augmented_cached at include/linux/rbtree_augmented.h:58 > (inlined by) vma_interval_tree_insert at mm/interval_tree.c:23 > __vma_link_file+0x6e/0xe0 > __vma_link_file at mm/mmap.c:629 > vma_link+0xa2/0x120 > mmap_region+0x753/0xb90 > do_mmap+0x45c/0x710 > vm_mmap_pgoff+0xc0/0x130 > ksys_mmap_pgoff+0x1d1/0x300 > __x64_sys_mmap+0x33/0x40 > do_syscall_64+0x91/0xc44 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > read to 0xffff9cf8bba08a80 of 200 bytes by task 14262 on cpu 122: > vm_area_dup+0x6a/0xe0 > vm_area_dup at kernel/fork.c:362 > __split_vma+0x72/0x2a0 > __split_vma at mm/mmap.c:2661 > split_vma+0x5a/0x80 > mprotect_fixup+0x368/0x3f0 > do_mprotect_pkey+0x263/0x420 > __x64_sys_mprotect+0x51/0x70 > do_syscall_64+0x91/0xc44 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > vm_area_dup() blindly copies all fields of original VMA to the new one. > This includes coping vm_area_struct::shared.rb which is normally > protected by i_mmap_lock. But this is fine because the read value will > be overwritten on the following __vma_link_file() under proper > protection. Thus, mark it as an intentional data race and insert a few > assertions for the fields that should not be modified concurrently. > > Signed-off-by: Qian Cai <cai@xxxxxx> Queued for safekeeping on -rcu. I had to adjust a bit to get it to apply on -rcu, please see below. In my experience, git should have no trouble figuring it out. ;-) Thanx, Paul ------------------------------------------------------------------------ commit 1228aca56f2a25b67876d8a819437b620a6e1cee Author: Qian Cai <cai@xxxxxx> Date: Wed Feb 19 11:00:54 2020 -0800 fork: Annotate a data race in vm_area_dup() struct vm_area_struct could be accessed concurrently as noticed by KCSAN, write to 0xffff9cf8bba08ad8 of 8 bytes by task 14263 on cpu 35: vma_interval_tree_insert+0x101/0x150: rb_insert_augmented_cached at include/linux/rbtree_augmented.h:58 (inlined by) vma_interval_tree_insert at mm/interval_tree.c:23 __vma_link_file+0x6e/0xe0 __vma_link_file at mm/mmap.c:629 vma_link+0xa2/0x120 mmap_region+0x753/0xb90 do_mmap+0x45c/0x710 vm_mmap_pgoff+0xc0/0x130 ksys_mmap_pgoff+0x1d1/0x300 __x64_sys_mmap+0x33/0x40 do_syscall_64+0x91/0xc44 entry_SYSCALL_64_after_hwframe+0x49/0xbe read to 0xffff9cf8bba08a80 of 200 bytes by task 14262 on cpu 122: vm_area_dup+0x6a/0xe0 vm_area_dup at kernel/fork.c:362 __split_vma+0x72/0x2a0 __split_vma at mm/mmap.c:2661 split_vma+0x5a/0x80 mprotect_fixup+0x368/0x3f0 do_mprotect_pkey+0x263/0x420 __x64_sys_mprotect+0x51/0x70 do_syscall_64+0x91/0xc44 entry_SYSCALL_64_after_hwframe+0x49/0xbe vm_area_dup() blindly copies all fields of original VMA to the new one. This includes coping vm_area_struct::shared.rb which is normally protected by i_mmap_lock. But this is fine because the read value will be overwritten on the following __vma_link_file() under proper protection. Thus, mark it as an intentional data race and insert a few assertions for the fields that should not be modified concurrently. Signed-off-by: Qian Cai <cai@xxxxxx> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx> diff --git a/kernel/fork.c b/kernel/fork.c index 60a1295..e592e6f 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -359,7 +359,13 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig) struct vm_area_struct *new = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL); if (new) { - *new = *orig; + ASSERT_EXCLUSIVE_WRITER(orig->vm_flags); + ASSERT_EXCLUSIVE_WRITER(orig->vm_file); + /* + * orig->shared.rb may be modified concurrently, but the clone + * will be reinitialized. + */ + *new = data_race(*orig); INIT_LIST_HEAD(&new->anon_vma_chain); } return new;