On 4/29/21 9:25 PM, Stefan Hajnoczi wrote:
On Wed, Apr 28, 2021 at 11:48:21PM -0400, Shivaprasad G Bhat wrote:
The nvdimm devices are expected to ensure write persistence during power
failure kind of scenarios.
The libpmem has architecture specific instructions like dcbf on POWER
to flush the cache data to backend nvdimm device during normal writes
followed by explicit flushes if the backend devices are not synchronous
DAX capable.
Qemu - virtual nvdimm devices are memory mapped. The dcbf in the guest
and the subsequent flush doesn't traslate to actual flush to the backend
file on the host in case of file backed v-nvdimms. This is addressed by
virtio-pmem in case of x86_64 by making explicit flushes translating to
fsync at qemu.
On SPAPR, the issue is addressed by adding a new hcall to
request for an explicit flush from the guest ndctl driver when the backend
nvdimm cannot ensure write persistence with dcbf alone. So, the approach
here is to convey when the hcall flush is required in a device tree
property. The guest makes the hcall when the property is found, instead
of relying on dcbf.
Sorry, I'm not very familiar with SPAPR. Why add a hypercall when the
virtio-nvdimm device already exists?
On virtualized ppc64 platforms, guests use papr_scm.ko kernel drive for
persistent memory support. This was done such that we can use one kernel
driver to support persistent memory with multiple hypervisors. To avoid
supporting multiple drivers in the guest, -device nvdimm Qemu
command-line results in Qemu using PAPR SCM backend. What this patch
series does is to make sure we expose the correct synchronous fault
support, when we back such nvdimm device with a file.
The existing PAPR SCM backend enables persistent memory support with the
help of multiple hypercall.
#define H_SCM_READ_METADATA 0x3E4
#define H_SCM_WRITE_METADATA 0x3E8
#define H_SCM_BIND_MEM 0x3EC
#define H_SCM_UNBIND_MEM 0x3F0
#define H_SCM_UNBIND_ALL 0x3FC
Most of them are already implemented in Qemu. This patch series
implements H_SCM_FLUSH hypercall.
-aneesh