Re: [RFC 0/2] New MAP_PMEM_AWARE mmap flag

Jeff Moyer <jmoyer@xxxxxxxxxx> · Tue, 23 Feb 2016 09:22:16 -0500

Boaz Harrosh <boaz@xxxxxxxxxxxxx> writes:

>> An application creates a file and writes to every single block in the
>> thing, sync's it, closes it.  It then opens it back up, calls mmap with
>> this new MAP_DAX flag or on a file system mounted with -o dax, and
>> proceeds to access the file using loads and stores.  It persists its
>> data by using non-temporal stores, flushing and fencing cpu
>> instructions.
>> 
>> If I understand you correctly, you're saying that that application is
>> not written correctly, because it needs to call fsync to persist
>> metadata (that it presumably did not modify).  Is that right?
>> 
>
> Hi Jeff
>
> I do not understand why you chose to drop my email address from your
> reply? What do I need to feel when this happens?

Hi Boaz,

Sorry you were dropped, that was not my intention; I blame my mailer, as
I did hit reply-all.  No hard feelings?

> And to your questions above. As I answered to Dave.
> This is the novelty of my approach and the big difference between
> what you guys thought with MAP_DAX and my patches as submitted.
>  1. Application will/need to call m/fsync to let the FS the freedom it needs
>  2. The m/fsync as well as the page faults will be very light wait and fast,
>     all that is required from the pmem aware app is to do movnt stores and cl_flushes.

I like the approach for these existing file systems.

> So enjoying both worlds. And actually more:
> With your approach of fallocat(ing) the all space in advance you might as well
> just partition the storage and use the DAX(ed) block device. But with my
> approach you need not pre-allocate and enjoy the over provisioned model and
> the space allocation management of a modern FS. And even with all that still
> enjoy very fast direct mapped stores by not requiring the current slow m/fsync()

Well, that remains to be seen.  Certainly for O_DIRECT appends or hole
filling, there is extra overhead involved when compared to writes to
already-existing blocks.  Apply that to DAX and the overhead will be
much more prominent.  I'm not saying that this is definitely the case,
but I think it's something we'll have to measure going forward.

> I hope you guys stand behind me in my effort to accelerate userspace pmem apps
> and still not break any built in assumptions.

I do like the idea of reducing the msync/fsync overhead, though I admit
I haven't yet looked at the patches in any detail.  My mail in this
thread was primarily an attempt to wrap my head around why the fs needs
the fsync/msync at all.  I've got that cleared up now.

Cheers,
Jeff

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>