Re: [RFC 0/2] New MAP_PMEM_AWARE mmap flag

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 24 Feb 2016 10:28:13 +1100

On Wed, Feb 24, 2016 at 12:15:34AM +0200, Boaz Harrosh wrote:
> On 02/23/2016 11:47 PM, Dave Chinner wrote:
> <>
> > 
> > i.e. what we've implemented right now is a basic, slow,
> > easy-to-make-work-correctly brute force solution. That doesn't mean
> > we always need to implement it this way, or that we are bound by the
> > way dax_clear_sectors() currently flushes cachelines before it
> > returns. It's just a simple implementation that provides the
> > ordering the *filesystem requires* to provide the correct data
> > integrity semantics to userspace.
> > 
> 
> Or it can be written properly with movnt instructions and be even
> faster the a simple memset, and no need for any cl_flushing let alone
> any radix-tree locking.

Precisely my point - semantics of persistent memory durability are
going to change from kernel to kernel, architecture to architecture,
and hardware to hardware.

Assuming applications are going to handle all these wacky
differences to provide their users with robust data integrity is a
recipe for disaster. If applications writers can't even use fsync
properly, I can guarantee you they are going to completely fuck up
data integrity when targeting pmem.

> That said your suggestion above is 25%-100% slower than current code
> because the cl_flushes will be needed eventually, and the atomics of a
> lock takes 25% the time of a full page copy.

So what? We can optimise for performance later, once we've provided
correct and resilient infrastructure. We've been fighting against
premature optimisation for performance from teh start with DAX -
we've repeatedly had to undo stuff that was fast but broken, and
were not doing that any more. Correctness comes first, then we can
address the performance issues via iterative improvement, like we do
with everything else.

> You are forgetting we are
> talking about memory and not harddisk. the rules are different.

That's bullshit, Boaz. I'm sick and tired of people saying "but pmem
is different" as justification for not providing correct, reliable
data integrity behaviour. Filesytems on PMEM have to follow all the
same rules as any other type of persistent storage we put
filesystems on.

Yes, the speed of the storage may expose the fact that am
unoptimised correct implementation is a lot more expensive than
ignoring correctness, but that does not mean we can ignore
correctness. Nor does it mean that a correct implementation will be
slow - it just means we haven't optimised for speed yet because
getting it correct is a hard problem and our primary focus.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>