On Fri, 24 Nov 2017, Dan Williams wrote: > On Wed, Nov 22, 2017 at 6:36 PM, Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote: > > Hi > > > > Here I'm sending slighly update dm-writecache target. > > > > I would expect some theory of operation documentation in the changelog > or in Documentation/ for a new driver. It caches writes for a block device in persistent memory. It doesn't cache reads because reads are supposed to be cached in normal ram. > > Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx> > > > [..] > > +/* > > + * The clflushopt instruction is very slow on Broadwell, so we use non-temporal > > + * stores instead. > > + */ > > The clflushopt instruction's first instantiation in a cpu was after > Broadwell. Also, the performance of clflushopt does not much matter > when running on a system without ADR (asynchronous DRAM refresh) where > flushing the cpu cache can just leave the writes stranded on their way > to the DIMM. ADR is not available on general purpose servers until > ACPI 6.x NFIT-defined persistent memory platforms. There used to be pcommit instruction, but intel abandoned it - see https://software.intel.com/en-us/blogs/2016/09/12/deprecate-pcommit-instruction So, according to the document, flushing the cache should be enough for writes to reach persistent memory. > > +#ifdef CONFIG_X86_64 > > In general something is broken if we end up with per-arch ifdefs like > this in drivers to handle persistent memory. This should be using the > pmem api or extensions of it, and we need to settle on a mechanism for > upper-level drivers to ask if pmem is driving platform protected > persistent memory. > > > +#define NT_STORE(dest, src) asm ("movnti %1, %0" : "=m"(dest) : "r"(src)) > > +#define FLUSH_RANGE(dax, ptr, size) do { } while (0) > > +#define COMMIT_FLUSHED() wmb() > > +#else > > +#define NT_STORE(dest, src) ACCESS_ONCE(dest) = (src) > > +#define FLUSH_RANGE dax_flush > > +#define COMMIT_FLUSHED() do { } while (0) > > Is this just for test purposes? How does the user discover that they > are running in a degraded mode as far as persistence guarantees? I > think we should be falling back DM_WRITECACHE_ONLY_SSD mode if we're > not on a pmem platform. What degraded mode do you mean? According to that document, flushing cache should be enough. If it is not enough, what else do I need to do to flush cache? The above code is not for test purposes. It is for performance purposes. On dual-socket Broadwell server with persistent memory, when we write to persistent memory using cached write instructions and use dax_flush afterwards to flush cache for the affected range, the performance is about 350MB/s. It is practically unusable - worse than low-end SSDs. On the other hand, the movnti instruction can sustain performance of one 8-byte write per clock cycle. We don't have to flush cache afterwards, the only thing that must be done is to flush the write-combining buffer with the sfence instruction. Movnti has much better throughput than dax_flush. You argue that I should use the pmem api, but there is no pmem api providing the movnti intruction. I have no choice, but coding the instruction directly in assembler. If you want to create API, take that "NT_STORE" definition and move it to some x86-specific include file. Mikulas -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel