Re: [Qemu-devel] KVM "fake DAX" flushing interface - discussion

Dan Williams <dan.j.williams@xxxxxxxxx> · Mon, 6 Nov 2017 08:57:19 -0800

On Sun, Nov 5, 2017 at 11:57 PM, Pankaj Gupta <pagupta@xxxxxxxxxx> wrote:
>
>
>> [..]
>> >> Yes, the GUID will specifically identify this range as "Virtio Shared
>> >> Memory" (or whatever name survives after a bikeshed debate). The
>> >> libnvdimm core then needs to grow a new region type that mostly
>> >> behaves the same as a "pmem" region, but drivers/nvdimm/pmem.c grows a
>> >> new flush interface to perform the host communication. Device-dax
>> >> would be disallowed from attaching to this region type, or we could
>> >> grow a new device-dax type that does not allow the raw device to be
>> >> mapped, but allows a filesystem mounted on top to manage the flush
>> >> interface.
>> >
>> >
>> > I am afraid it is not a good idea that a single SPA is used for multiple
>> > purposes. For the region used as "pmem" is directly mapped to the VM so
>> > that guest can freely access it without host's assistance, however, for
>> > the region used as "host communication" is not mapped to VM, so that
>> > it causes VM-exit and host gets the chance to do specific operations,
>> > e.g, flush cache. So we'd better distinctly define these two regions to
>> > avoid the unnecessary complexity in hypervisor.
>>
>> Good point, I was assuming that the mmio flush interface would be
>> discovered separately from the NFIT-defined memory range. Perhaps via
>> PCI in the guest? This piece of the proposal  needs a bit more
>> thought...
>
> Also, in earlier discussions we agreed for entire device flush whenever guest
> performs a fsync on DAX file. If we do a MMIO call for this, guest CPU would be
> trapped for the duration device flush is completed.
>
> Instead, if we do perform an asynchronous flush guest CPU's can be utilized by
> some other tasks till flush completes?

Yes, the interface for the guest to trigger and wait for flush
requests should be asynchronous, just like a storage "flush-cache"
command.