On Mon, 30 Nov 2020 09:16:14 -0600 Shivaprasad G Bhat <sbhat@xxxxxxxxxxxxx> wrote: > The nvdimm devices are expected to ensure write persistent during power > failure kind of scenarios. > > The libpmem has architecture specific instructions like dcbf on power > to flush the cache data to backend nvdimm device during normal writes. > > Qemu - virtual nvdimm devices are memory mapped. The dcbf in the guest > doesn't traslate to actual flush to the backend file on the host in case > of file backed vnvdimms. This is addressed by virtio-pmem in case of x86_64 > by making asynchronous flushes. > > On PAPR, issue is addressed by adding a new hcall to > request for an explicit asynchronous flush requests from the guest ndctl > driver when the backend nvdimm cannot ensure write persistence with dcbf > alone. So, the approach here is to convey when the asynchronous flush is > required in a device tree property. The guest makes the hcall when the > property is found, instead of relying on dcbf. > > The first patch adds the necessary asynchronous hcall support infrastructure > code at the DRC level. Second patch implements the hcall using the > infrastructure. > > Hcall semantics are in review and not final. > > A new device property sync-dax is added to the nvdimm device. When the > sync-dax is off(default), the asynchronous hcalls will be called. > > With respect to save from new qemu to restore on old qemu, having the > sync-dax by default off(when not specified) causes IO errors in guests as > the async-hcall would not be supported on old qemu. The new hcall > implementation being supported only on the new pseries machine version, > the current machine version checks may be sufficient to prevent > such migration. Please suggest what should be done. > First, all requests that are still not completed from the guest POV, ie. the hcall hasn't returned H_SUCCESS yet, are state that we should migrate in theory. In this case, I guess we rather want to drain all pending requests on the source in some pre-save handler. Then, as explained in another mail, you should enforce stable behavior for existing machine types with some hw_compat magic. > The below demonstration shows the map_sync behavior with sync-dax on & off. > (https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/ndctl.py.data/map_sync.c) > > The pmem0 is from nvdimm with With sync-dax=on, and pmem1 is from nvdimm with syn-dax=off, mounted as > /dev/pmem0 on /mnt1 type xfs (rw,relatime,attr2,dax=always,inode64,logbufs=8,logbsize=32k,noquota) > /dev/pmem1 on /mnt2 type xfs (rw,relatime,attr2,dax=always,inode64,logbufs=8,logbsize=32k,noquota) > > [root@atest-guest ~]# ./mapsync /mnt1/newfile ----> When sync-dax=off > [root@atest-guest ~]# ./mapsync /mnt2/newfile ----> when sync-dax=on > Failed to mmap with Operation not supported > > --- > v1 - https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg06330.html > Changes from v1 > - Fixed a missed-out unlock > - using QLIST_FOREACH instead of QLIST_FOREACH_SAFE while generating token > > Shivaprasad G Bhat (2): > spapr: drc: Add support for async hcalls at the drc level > spapr: nvdimm: Implement async flush hcalls > > > hw/mem/nvdimm.c | 1 > hw/ppc/spapr_drc.c | 146 ++++++++++++++++++++++++++++++++++++++++++++ > hw/ppc/spapr_nvdimm.c | 79 ++++++++++++++++++++++++ > include/hw/mem/nvdimm.h | 10 +++ > include/hw/ppc/spapr.h | 3 + > include/hw/ppc/spapr_drc.h | 25 ++++++++ > 6 files changed, 263 insertions(+), 1 deletion(-) > > -- > Signature > >