On Fri, 25 Apr 2014, Linus Torvalds wrote: > On Fri, Apr 25, 2014 at 5:01 AM, Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > > > Two, Ben said earlier that he's more worried about users of > > unmap_mapping_range() than concurrent munmap(); and you said > > earlier that you would almost prefer to have some special lock > > to serialize with page_mkclean(). > > > > Er, i_mmap_mutex. > > > > That's what unmap_mapping_range(), and page_mkclean()'s rmap_walk, > > take to iterate over the file vmas. So perhaps there's no race at all > > in the unmap_mapping_range() case. And easy (I imagine) to fix the > > race in Dave's racewrite.c use of MADV_DONTNEED: untested patch below. > > Hmm. unmap_mapping_range() is just abotu the only thing that _does_ > take i_mmap_mutex. unmap_single_vma() does it for > is_vm_hugetlb_page(), which is a bit confusing. And normally we only > take it for the actual final vma link/unlink, not for the actual > traversal. So we'd have to change that all quite radically (or we'd > have to drop and re-take it). > > So I'm not quite convinced. Your simple patch looks simple and should > certainly fix DaveH's test-case, but then leaves munmap/exit as a > separate thing to fix. And I don't see how to do that cleanly (it > really looks like "we'll just have to take that semaphore again > separately). Yes, mine is quite nice for the MADV_DONTNEED case, but needs more complication to handle munmap/exit. I don't want to drop and retake, I'm hoping we can decide what to do via the zap_details. It would still be messy that sometimes we come in with the mutex and sometimes we take it inside; but then it's already messy that sometimes we have it and sometimes we don't. I'll try extending the patch to munmap/exit in a little bit, and send out the result for comparison later today. > > i_mmap_mutex is likely not contended, but we *do* take it for private > mappings too (and for read-only ones), so this lock is actually much > more common than the dirty shared mapping. We only need to take it in the (presumed rare beyond shared memory) VM_SHARED case. I'm not very keen on adding a mutex into the exit path, but at least this one used to be a spinlock, and there should be nowhere that tries to allocate memory while holding it (I'm wary of adding OOM-kill deadlocks) - aside from tlb_next_batch()'s GFP_NOWAIT|__GFP_NOWARN attempts. > > So I think I prefer my patch, even if that may be partly due to just > it being mine ;) Yes, fair enough, I'm not against it: I just felt rather ashamed of going on about page_mkclean() protocol and ptlock, while forgetting all about i_mmap_mutex. Let's see how beautiful mine turns out ;) And it may be worth admitting that here we avoid CONFIG_PREEMPT, and have been bitten in the past by free_pages_and_swap_cache() latencies, so drastically reduced MAX_GATHER_BATCH_COUNT: so we wouldn't expect to see much hit from your more frequent TLB flushes. I must answer Peter now... Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html