Re: munmap, msync: synchronization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Christoph,

On 04/21/2014 08:14 PM, Christoph Hellwig wrote:
> On Mon, Apr 21, 2014 at 12:16:46PM +0200, Michael Kerrisk (man-pages) wrote:
>> 1. In the bad old days (even on Linux, AFAIK, but that was in days
>>    before I looked closely at what goes on), the page cache and
>>    the buffer cache were not unified. That meant that a page from 
>>    a file might both be in the buffer cache (because of file I/O
>>    syscalls) and in the page cache (because of mmap()).
> 
> Correct.
> 
>> 2. In a non-unified cache system, pages can naturally get out of
>>    synch in the two locations. Before it had a unified cache, Linux 
>>    used to jump some hoops to ensure that contents in the two 
>>    locations remained consistent.
> 
> Yeah.
> 
>> 3. Nowadays Linux--like most (all?) UNIX systems--has a 
>>    unified cache: file I/O, mmap(), and the paging system all 
>>    use the same cache. If a file is mmap()-ed and also subject
>>    to file I?/, there will be only one copy of each file page 
>>    in the cache. Ergo, the inconsistency problem goes away.
> 
> Mostly true, except for FreeBSD and Solaris when they use ZFS, which has
> it's own file cache that is not coherent with the VM cache at the
> implementation level.  Not sure how much of this leaks to userspace,
> though.

Thanks for that detail.

>> 4. IIUC, the pieces like msync(MS_ASYNC) and msync(MS_INVALIDATE)
>>    exist only because of the bad old non-unified cache days.
>>    MS_INVALIDATE was a way of saying: make sure that writes
>>    to the file by other processes are visible in this mapping.
>>    msync() without the MS_INVALIDATE flags was a way of saying:
>>    make sure that read()s from the file see the changes made
>>    via this mapping. Using either MS_SYNC or MS_ASYNC
>>    was the way of saying: "I either want to wait until the file
>>    updates have been completed", or "please start the updates
>>    now, but I don't want to wait until they're completed".
> 
> Right.
> 
>> 5. On systems with a unified cache, msync(MS_INVALIDATE)
>>    is a no-op. (That is so on Linux.)
> 
> Almost.  It returns EBUSY if it hits any mlock()ed region.  Don't ask me
> why, though..

Ahhh yes, I was aware of that detail, but overlooked it in the point 
above.

>> 6. On Linux, MS_ASYNC is also a no-op. That's fine on a unified 
>>    cache system. Filesystem I/O always sees a consistent view,
>>    and MS_ASYNC never undertook to give a guarantee about *when*
>>    the update would occur. (The Linux buffer cache logic will 
>>    ensure that it is flushed out sometime in the near future.)
> 
> Right.  It's a fairly inefficient noop, though - it actually loops
> over all vmas to do nothing with them.
> 
>> 7. On Linux (and probably many other modern systems), the only
>>    call that has any real use is msync(MS_SYNC), meaning
>>    "flush the buffers *now*, and I want to wait for that to 
>>    complete, so that I can then continue safe in the knowledge
>>    that my data has landed on a device". That's useful if we
>>    want insurance for our data in the event of a system crash.
> 
> Right.  It's basically another way to call fsync, which is used to
> implement it underneath.  It actually should be a ranged-fdatasync
> but right it's it's implemented horribly inefficiently in that it
> does a fsync call for each vma that it encounters in the range
> specified.
> 
>> 8. POSIX make no mandate for a unified cache system. Thus,
>>    we have MS_ASYNC and MS_INVALIDATE in the standard, and
>>    the standard says nothing (AFAIK) about whether munmap() 
>>    will flush data. On Linux (and probably most modern systems),
>>    we're fine. but portable applications that care about 
>>    standards and nonunified caches need to use msync().
>>
>>    My advice: To ensure that the contents of a shared file
>>    mapping are written to the underlying file--even on bad old
>>    implementations--a call to msync() should be made before 
>>    unmapping a mapping with munmap().
> 
> Agreed.

Thanks for checking all of this over and thanks also
for confirming that I learned my lessens well in the
"Jamie Lokier school of tough technical reviewing" ;-).

>> 9. The mmap() man page says this:
>>
>>        MAP_SHARED 
>>            Share this mapping.  Updates to the mapping are vis???
>>            ible to other processes that map this file, and  are
>>            carried  through  to  the underlying file.  The file
>>            may not actually be updated until msync(2)  or  mun???
>>            map() is called.
>>
>>    I believe the piece "or munmap()" is misleading. It implies
>>    that munmap() must trigger a sync action. I don't think this
>>    is true. All that it is required to do is remove some range
>>    of pages from the process's virtual address space. I'm
>>    inclined to remove those words, but I'd like to see if any
>>    FS person has a correction to my understanding first.
> 
> I would expect non-coherent systems to update their caches on munmap,
> Posix does not seem to require this, and I can't find any language
> towards that in the HP-UX man page, which was a system that I remember
> as non-coherent until the end.

Yes, that's how I read it too. POSIX seems to have no requirements here,
so I assume it was catering to to the lowest common denominator.

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux