Re: Subtle races between DAX mmap fault and write path

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 29, 2016 at 5:12 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Fri, Jul 29, 2016 at 07:44:25AM -0700, Dan Williams wrote:
>> On Thu, Jul 28, 2016 at 7:21 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>> > On Thu, Jul 28, 2016 at 10:10:33AM +0200, Jan Kara wrote:
>> >> On Thu 28-07-16 08:19:49, Dave Chinner wrote:
>> [..]
>> >> So DAX doesn't need flushing to maintain consistent view of the data but it
>> >> does need flushing to make sure fsync(2) results in data written via mmap
>> >> to reach persistent storage.
>> >
>> > I thought this all changed with the removal of the pcommit
>> > instruction and wmb_pmem() going away.  Isn't it now a platform
>> > requirement now that dirty cache lines over persistent memory ranges
>> > are either guaranteed to be flushed to persistent storage on power
>> > fail or when required by REQ_FLUSH?
>>
>> No, nothing automates cache flushing.  The path of a write is:
>>
>> cpu-cache -> cpu-write-buffer -> bus -> imc -> imc-write-buffer -> media
>>
>> The ADR mechanism and the wpq-flush facility flush data thorough the
>> imc (integrated memory controller) to media.  dax_do_io() gets writes
>> to the imc, but we still need a posted-write-buffer flush mechanism to
>> guarantee data makes it out to media.
>
> So what you are saying is that on and ADR machine, we have these
> domains w.r.t. power fail:
>
> cpu-cache -> cpu-write-buffer -> bus -> imc -> imc-write-buffer -> media
>
> |-------------volatile-------------------|-----persistent--------------|
>
> because anything that gets to the IMC is guaranteed to be flushed to
> stable media on power fail.
>
> But on a posted-write-buffer system, we have this:
>
> cpu-cache -> cpu-write-buffer -> bus -> imc -> imc-write-buffer -> media
>
> |-------------volatile-------------------------------------------|--persistent--|
>
> IOWs, only things already posted to the media via REQ_FLUSH are
> considered stable on persistent media.  What happens in this case
> when power fails during a media update? Incomplete writes?
>
>> > Or have we somehow ended up with the fucked up situation where
>> > dax_do_io() writes are (effectively) immediately persistent and
>> > untracked by internal infrastructure, whilst mmap() writes
>> > require internal dirty tracking and fsync() to flush caches via
>> > writeback?
>>
>> dax_do_io() writes are not immediately persistent.  They bypass the
>> cpu-cache and cpu-write-bufffer and are ready to be flushed to media
>> by REQ_FLUSH or power-fail on an ADR system.
>
> IOWs, on an ADR system  write is /effectively/ immediately persistent
> because if power fails ADR guarantees it will be flushed to stable
> media, while on a posted write system it is volatile and will be
> lost. Right?
>
> If so, that's even worse than just having mmap/write behave
> differently - now writes will behave differently depending on the
> specific hardware installed. I think this makes it even more
> important for the DAX code to hide this behaviour from the
> fielsystems by treating everything as volatile.

Sorry, I confused things above by implying that Linux will need to
consider NVDIMM platforms without ADR.

ADR is already required for present day NVDIMM platforms and that
requirement continues.  The explicit flushing allowed by REQ_FLUSH is
an optional mechanism to backstop ADR, but is not required and will
not be used as an alternative to ADR.  See pages 21 and 22 of the
latest driver writer's guide if you want more details [1].

Long story short, we should always consider writes that enter the
persistence domain (movnt + sfence) as persistent regardless of the
presence of WPQ-flush.

[1]: http://pmem.io/documents/NVDIMM_DriverWritersGuide-July-2016.pdf
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux