On 11/01/2017 11:20 PM, Dan Williams wrote:
On 11/01/2017 12:25 PM, Dan Williams wrote:
[..]
It's not persistent memory if it requires a hypercall to make it
persistent. Unless memory writes can be made durable purely with cpu
instructions it's dangerous for it to be treated as a PMEM range.
Consider a guest that tried to map it with device-dax which has no
facility to route requests to a special flushing interface.
Can we separate the concept of flush interface from persistent memory?
Say there are two APIs, one is used to indicate the memory type (i.e,
/proc/iomem) and another one indicates the flush interface.
So for existing nvdimm hardwares:
1: Persist-memory + CLFLUSH
2: Persiste-memory + flush-hint-table (I know Intel does not use it)
and for the virtual nvdimm which backended on normal storage:
Persist-memory + virtual flush interface
I see the flush interface as fundamental to identifying the media
properties. It's not byte-addressable persistent memory if the
application needs to call a sideband interface to manage writes. This
is why we have pushed for something like the MAP_SYNC interface to
make filesystem-dax actually behave in a way that applications can
safely treat it as persistent memory, and this is also the guarantee
that device-dax provides. Changing the flush interface makes it
distinct and unusable for applications that want to manage data
persistence in userspace.
I was thinking that from the device's perspective, both of them are
not persistent until a flush operation is issued (clflush or virtual
flush-interface). But you are right, from the user/software's
perspective, their fundamentals are different.
So for the virtual nvdimm which is backended on normal storage, we
should refuse MAP_SYNC and the only way to guarantee persistence
is fsync/fdatasync.
Actually, we can treat a SPA region which associates with specific
flush interface as special GUID as your proposal, please see more
in below comment...
In what way is this "more complicated"? It was trivial to add support
for the "volatile" NFIT range, this will not be any more complicated
than that.
Introducing memory type is easy indeed, however, a new flush interface
definition is inevitable, i.e, we need a standard way to discover the
MMIOs to communicate with host.
Right, the proposed way to do that for x86 platforms is a new SPA
Range GUID type. in the NFIT.
So this SPA is used for both persistent memory region and flush interface?
Maybe i missed it in previous mails, could you please detail how to do
it?
Yes, the GUID will specifically identify this range as "Virtio Shared
Memory" (or whatever name survives after a bikeshed debate). The
libnvdimm core then needs to grow a new region type that mostly
behaves the same as a "pmem" region, but drivers/nvdimm/pmem.c grows a
new flush interface to perform the host communication. Device-dax
would be disallowed from attaching to this region type, or we could
grow a new device-dax type that does not allow the raw device to be
mapped, but allows a filesystem mounted on top to manage the flush
interface.
I am afraid it is not a good idea that a single SPA is used for multiple
purposes. For the region used as "pmem" is directly mapped to the VM so
that guest can freely access it without host's assistance, however, for
the region used as "host communication" is not mapped to VM, so that
it causes VM-exit and host gets the chance to do specific operations,
e.g, flush cache. So we'd better distinctly define these two regions to
avoid the unnecessary complexity in hypervisor.