Re: Dirty/Access bits vs. page content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 24 Apr 2014, Linus Torvalds wrote:
> On Thu, Apr 24, 2014 at 7:50 PM, H. Peter Anvin <hpa@xxxxxxxxx> wrote:
> >
> > The cases where they occur the mappings tend to be highly stable, i.e.
> > map once *specifically* to be able to do a whole bunch of things without
> > system calls, and then unmap when done.
> 
> Yes. But even that tends to be unusual. mmap() really is bad at
> writing, since you inevitably get read-modify-write patterns etc. So
> it's only useful for fixing up things after-the-fact, which in itself
> is a horrible pattern.
> 
> Don't get me wrong - it exists, but it's really quite rare because it
> has so many problems. Even people who do "fixup" kind of stuff tend to
> map things privately, change things, and then write out the end
> result. That way you can get atomicity by then doing a single
> "rename()" at the end, for example.
> 
> The traditional case for it used to be the nntp index, and these days
> I know some imap indexer (dovecot?) uses it. Every other example of it
> I have ever seen has been a VM stress tester..

Your patch looks good to me (nice use of force_flush), and runs fine
here in normal usage; but I've not actually tried Dave's racewrite.c.

However, I have had a couple of contrarian half-thoughts, that
ordinarily I'd prefer to mull over more before blurting out,
but in the circumstances better say sooner than later.

One, regarding dirty shared mappings: you're thinking above of
mmap()'ing proper filesystem files, but this case also includes
shared memory - I expect there are uses of giant amounts of shared
memory, for which we really would prefer not to slow the teardown.

And confusingly, those are not subject to the special page_mkclean()
constraints, but still need to be handled in a correct manner: your
patch is fine, but might be overkill for them - I'm not yet sure.

Two, Ben said earlier that he's more worried about users of
unmap_mapping_range() than concurrent munmap(); and you said
earlier that you would almost prefer to have some special lock
to serialize with page_mkclean().

Er, i_mmap_mutex.

That's what unmap_mapping_range(), and page_mkclean()'s rmap_walk,
take to iterate over the file vmas.  So perhaps there's no race at all
in the unmap_mapping_range() case.  And easy (I imagine) to fix the
race in Dave's racewrite.c use of MADV_DONTNEED: untested patch below.

But exit and munmap() don't take i_mmap_mutex: perhaps they should
when encountering a VM_SHARED vma (I believe VM_SHARED should be
peculiar to having vm_file set, but test both below because I don't
want to oops in some odd corner where a special vma is set up).

Hugh

--- 3.15-rc2/mm/madvise.c	2013-11-03 15:41:51.000000000 -0800
+++ linux/mm/madvise.c	2014-04-25 04:10:40.124514427 -0700
@@ -274,10 +274,16 @@ static long madvise_dontneed(struct vm_a
 			     struct vm_area_struct **prev,
 			     unsigned long start, unsigned long end)
 {
+	struct address_space *mapping = NULL;
+
 	*prev = vma;
 	if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
 		return -EINVAL;
 
+	if (vma->vm_file && (vma->vm_flags & VM_SHARED)) {
+		mapping = vma->vm_file->f_mapping;
+		mutex_lock(&mapping->i_mmap_mutex);
+	}
 	if (unlikely(vma->vm_flags & VM_NONLINEAR)) {
 		struct zap_details details = {
 			.nonlinear_vma = vma,
@@ -286,6 +292,8 @@ static long madvise_dontneed(struct vm_a
 		zap_page_range(vma, start, end - start, &details);
 	} else
 		zap_page_range(vma, start, end - start, NULL);
+	if (mapping)
+		mutex_unlock(&mapping->i_mmap_mutex);
 	return 0;
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]