Re: [RFC 0/2] New MAP_PMEM_AWARE mmap flag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Feb 21, 2016 at 02:03:43PM -0800, Dan Williams wrote:
> On Sun, Feb 21, 2016 at 1:23 PM, Boaz Harrosh <boaz@xxxxxxxxxxxxx> wrote:
> > On 02/21/2016 10:57 PM, Dan Williams wrote:
> >> On Sun, Feb 21, 2016 at 12:24 PM, Boaz Harrosh <boaz@xxxxxxxxxxxxx> wrote:
> >>> On 02/21/2016 09:51 PM, Dan Williams wrote:
> > Sure. please have a look. What happens is that the legacy app
> > will add the page to the radix tree, come the fsync it will be
> > flushed. Even though a "new-type" app might fault on the same page
> > before or after, which did not add it to the radix tree.
> > So yes, all pages faulted by legacy apps will be flushed.
> >
> > I have manually tested all this and it seems to work. Can you see
> > a theoretical scenario where it would not?
> 
> I'm worried about the scenario where the pmem aware app assumes that
> none of the cachelines in its mapping are dirty when it goes to issue
> pcommit.  We'll have two applications with different perceptions of
> when writes are durable.  Maybe it's not a problem in practice, at
> least current generation x86 cpus flush existing dirty cachelines when
> performing non-temporal stores.  However, it bothers me that there are
> cpus where a pmem-unaware app could prevent a pmem-aware app from
> making writes durable.  It seems if one app has established a
> MAP_PMEM_AWARE mapping it needs guarantees that all apps participating
> in that shared mapping have the same awareness.

Which, in practice, cannot work. Think cp, rsync, or any other
program a user can run that can read the file the MAP_PMEM_AWARE
application is using.

> Another potential issue is that MAP_PMEM_AWARE is not enough on its
> own.  If the filesystem or inode does not support DAX the application
> needs to assume page cache semantics.  At a minimum MAP_PMEM_AWARE
> requests would need to fail if DAX is not available.

They will always still need to call msync()/fsync() to guarantee
data integrity, because the filesystem metadata that indexes the
data still needs to be committed before data integrity can be
guaranteed. i.e. MAP_PMEM_AWARE by itself it not sufficient for data
integrity, and so the app will have to be written like any other app
that uses page cache based mmap().

Indeed, the application cannot even assume that a fully allocated
file does not require msync/fsync because the filesystem may be
doing things like dedupe, defrag, copy on write, etc behind the back
of the application and so file metadata changes may still be in
volatile RAM even though the application has flushed it's data.
Applications have no idea what the underlying filesystem and storage
is doing and so they cannot assume that complete data integrity is
provided by userspace driven CPU cache flush instructions on their
file data.

This "pmem aware applications only need to commit their data"
thinking is what got us into this mess in the first place. It's
wrong, and we need to stop trying to make pmem work this way because
it's a fundamentally broken concept.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]