Re: KVM "fake DAX" flushing interface - discussion

Dan Williams <dan.j.williams@xxxxxxxxx> · Tue, 31 Oct 2017 21:25:22 -0700

On Tue, Oct 31, 2017 at 8:43 PM, Xiao Guangrong
<xiaoguangrong.eric@xxxxxxxxx> wrote:
>
>
> On 10/31/2017 10:20 PM, Dan Williams wrote:
>>
>> On Tue, Oct 31, 2017 at 12:13 AM, Xiao Guangrong
>> <xiaoguangrong.eric@xxxxxxxxx> wrote:
>>>
>>>
>>>
>>> On 07/27/2017 08:54 AM, Dan Williams wrote:
>>>
>>>>> At that point, would it make sense to expose these special
>>>>> virtio-pmem areas to the guest in a slightly different way,
>>>>> so the regions that need virtio flushing are not bound by
>>>>> the regular driver, and the regular driver can continue to
>>>>> work for memory regions that are backed by actual pmem in
>>>>> the host?
>>>>
>>>>
>>>>
>>>> Hmm, yes that could be feasible especially if it uses the ACPI NFIT
>>>> mechanism. It would basically involve defining a new SPA (System
>>>> Phyiscal Address) range GUID type, and then teaching libnvdimm to
>>>> treat that as a new pmem device type.
>>>
>>>
>>>
>>> I would prefer a new flush mechanism to a new memory type introduced
>>> to NFIT, e.g, in that mechanism we can define request queues and
>>> completion queues and any other features to make virtualization
>>> friendly. That would be much simpler.
>>>
>>
>> No that's more confusing because now we are overloading the definition
>> of persistent memory. I want this memory type identified from the top
>> of the stack so it can appear differently in /proc/iomem and also
>> implement this alternate flush communication.
>>
>
> For the characteristic of memory, I have no idea why VM should know this
> difference. It can be completely transparent to VM, that means, VM
> does not need to know where this virtual PMEM comes from (for a really
> nvdimm backend or a normal storage). The only discrepancy is the flush
> interface.

It's not persistent memory if it requires a hypercall to make it
persistent. Unless memory writes can be made durable purely with cpu
instructions it's dangerous for it to be treated as a PMEM range.
Consider a guest that tried to map it with device-dax which has no
facility to route requests to a special flushing interface.

>
>> In what way is this "more complicated"? It was trivial to add support
>> for the "volatile" NFIT range, this will not be any more complicated
>> than that.
>>
>
> Introducing memory type is easy indeed, however, a new flush interface
> definition is inevitable, i.e, we need a standard way to discover the
> MMIOs to communicate with host.

Right, the proposed way to do that for x86 platforms is a new SPA
Range GUID type. in the NFIT.