On Fri, 25 Apr 2014, Peter Zijlstra wrote: > On Fri, Apr 25, 2014 at 05:01:23AM -0700, Hugh Dickins wrote: > > One, regarding dirty shared mappings: you're thinking above of > > mmap()'ing proper filesystem files, but this case also includes > > shared memory - I expect there are uses of giant amounts of shared > > memory, for which we really would prefer not to slow the teardown. > > > > And confusingly, those are not subject to the special page_mkclean() > > constraints, but still need to be handled in a correct manner: your > > patch is fine, but might be overkill for them - I'm not yet sure. > > I think we could look at mapping_cap_account_dirty(page->mapping) while > holding the ptelock, the mapping can't go away while we hold that lock. > > And afaict that's the exact differentiator between these two cases. Yes, that's easily done, but I wasn't sure whether it was correct to skip on shmem or not - just because shmem doesn't participate in the page_mkclean() protocol, doesn't imply it's free from similar bugs. I haven't seen a precise description of the bug we're anxious to fix: Dave's MADV_DONTNEED should be easily fixable, that's not a concern; Linus's first patch wrote of writing racing with cleaning, but didn't give a concrete example. How about this: a process with one thread repeatedly (but not very often) writing timestamp into a shared file mapping, but another thread munmaps it at some point; and another process repeatedly (but not very often) reads the timestamp file (or peeks at it through mmap); with memory pressure forcing page reclaim. In the page_mkclean() shared file case, the second process might see the timestamp move backwards: because a write from the timestamping thread went into the pagecache after it had been written, but the page not re-marked dirty; so when reclaimed and later read back from disk, the older timestamp is seen. But I think you can remove "page_mkclean() " from that paragraph: the same can happen with shmem written out to swap. It could not happen with ramfs, but I don't think we're desperate to special case ramfs these days. Hugh > > > Two, Ben said earlier that he's more worried about users of > > unmap_mapping_range() than concurrent munmap(); and you said > > earlier that you would almost prefer to have some special lock > > to serialize with page_mkclean(). > > > > Er, i_mmap_mutex. > > > > That's what unmap_mapping_range(), and page_mkclean()'s rmap_walk, > > take to iterate over the file vmas. So perhaps there's no race at all > > in the unmap_mapping_range() case. And easy (I imagine) to fix the > > race in Dave's racewrite.c use of MADV_DONTNEED: untested patch below. > > Ooh shiney.. yes that might work! > > > But exit and munmap() don't take i_mmap_mutex: perhaps they should > > when encountering a VM_SHARED vma > > Well, they will of course take it in order to detach the vma from the > rmap address_space::i_mmap tree. > > > (I believe VM_SHARED should be > > peculiar to having vm_file seta, but test both below because I don't > > want to oops in some odd corner where a special vma is set up). > > I think you might be on to something there... -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html