Xiao, Are these patches present in any git tree so that they can be easily tried out. Regards, Bharata. On Sun, Oct 11, 2015 at 9:22 AM, Xiao Guangrong <guangrong.xiao@xxxxxxxxxxxxxxx> wrote: > Changelog in v3: > There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo, > Michael for their valuable comments, the patchset finally gets better shape. > - changes from Igor's comments: > 1) abstract dimm device type from pc-dimm and create nvdimm device based on > dimm, then it uses memory backend device as nvdimm's memory and NUMA has > easily been implemented. > 2) let file-backend device support any kind of filesystem not only for > hugetlbfs and let it work on file not only for directory which is > achieved by extending 'mem-path' - if it's a directory then it works as > current behavior, otherwise if it's file then directly allocates memory > from it. > 3) we figure out a unused memory hole below 4G that is 0xFF00000 ~ > 0xFFF00000, this range is large enough for NVDIMM ACPI as build 64-bit > ACPI SSDT/DSDT table will break windows XP. > BTW, only make SSDT.rev = 2 can not work since the width is only depended > on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block) > in ACPI spec: > | Note: For compatibility with ACPI versions before ACPI 2.0, the bit > | width of Integer objects is dependent on the ComplianceRevision of the DSDT. > | If the ComplianceRevision is less than 2, all integers are restricted to 32 > | bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets > | the global integer width for all integers, including integers in SSDTs. > 4) use the lowest ACPI spec version to document AML terms. > 5) use "nvdimm" as nvdimm device name instead of "pc-nvdimm" > > - changes from Stefan's comments: > 1) do not do endian adjustment in-place since _DSM memory is visible to guest > 2) use target platform's target page size instead of fixed PAGE_SIZE > definition > 3) lots of code style improvement and typo fixes. > 4) live migration fix > - changes from Paolo's comments: > 1) improve the name of memory region > > - other changes: > 1) return exact buffer size for _DSM method instead of the page size. > 2) introduce mutex in NVDIMM ACPI as the _DSM memory is shared by all nvdimm > devices. > 3) NUMA support > 4) implement _FIT method > 5) rename "configdata" to "reserve-label-data" > 6) simplify _DSM arg3 determination > 7) main changelog update to let it reflect v3. > > Changlog in v2: > - Use litten endian for DSM method, thanks for Stefan's suggestion > > - introduce a new parameter, @configdata, if it's false, Qemu will > build a static and readonly namespace in memory and use it serveing > for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no > reserved region is needed at the end of the @file, it is good for > the user who want to pass whole nvdimm device and make its data > completely be visible to guest > > - divide the source code into separated files and add maintain info > > BTW, PCOMMIT virtualization on KVM side is work in progress, hopefully will > be posted on next week > > ====== Background ====== > NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported > on Intel's platform. They are discovered via ACPI and configured by _DSM > method of NVDIMM device in ACPI. There has some supporting documents which > can be found at: > ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf > NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf > DSM Interface Example: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf > Driver Writer's Guide: http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf > > Currently, the NVDIMM driver has been merged into upstream Linux Kernel and > this patchset tries to enable it in virtualization field > > ====== Design ====== > NVDIMM supports two mode accesses, one is PMEM which maps NVDIMM into CPU's > address space then CPU can directly access it as normal memory, another is > BLK which is used as block device to reduce the occupying of CPU address > space > > BLK mode accesses NVDIMM via Command Register window and Data Register window. > BLK virtualization has high workload since each sector access will cause at > least two VM-EXIT. So we currently only imperilment vPMEM in this patchset > > --- vPMEM design --- > We introduce a new device named "nvdimm", it uses memory backend device as > NVDIMM memory. The file in file-backend device can be a regular file and block > device. We can use any file when we do test or emulation, however, > in the real word, the files passed to guest are: > - the regular file in the filesystem with DAX enabled created on NVDIMM device > on host > - the raw PMEM device on host, e,g /dev/pmem0 > Memory access on the address created by mmap on these kinds of files can > directly reach NVDIMM device on host. > > --- vConfigure data area design --- > Each NVDIMM device has a configure data area which is used to store label > namespace data. In order to emulating this area, we divide the file into two > parts: > - first parts is (0, size - 128K], which is used as PMEM > - 128K at the end of the file, which is used as Label Data Area > So that the label namespace data can be persistent during power lose or system > failure. > > We also support passing the whole file to guest without reserve any region for > label data area which is achieved by "reserve-label-data" parameter - if it's > false then QEMU will build static and readonly namespace in memory and that > namespace contains the whole file size. The parameter is false on default. > > --- _DSM method design --- > _DSM in ACPI is used to configure NVDIMM, currently we only allow access of > label namespace data, i.e, Get Namespace Label Size (Function Index 4), > Get Namespace Label Data (Function Index 5) and Set Namespace Label Data > (Function Index 6) > > _DSM uses two pages to transfer data between ACPI and Qemu, the first page > is RAM-based used to save the input info of _DSM method and Qemu reuse it > store output info and another page is MMIO-based, ACPI write data to this > page to transfer the control to Qemu > > ====== Test ====== > In host > 1) create memory backed file, e.g # dd if=zero of=/tmp/nvdimm bs=1G count=10 > 2) append "-object memory-backend-file,share,id=mem1, > mem-path=/tmp/nvdimm -device nvdimm,memdev=mem1,reserve-label-data, > id=nv1" in QEMU command line > > In guest, download the latest upsteam kernel (4.2 merge window) and enable > ACPI_NFIT, LIBNVDIMM and BLK_DEV_PMEM. > 1) insmod drivers/nvdimm/libnvdimm.ko > 2) insmod drivers/acpi/nfit.ko > 3) insmod drivers/nvdimm/nd_btt.ko > 4) insmod drivers/nvdimm/nd_pmem.ko > You can see the whole nvdimm device used as a single namespace and /dev/pmem0 > appears. You can do whatever on /dev/pmem0 including DAX access. > > Currently Linux NVDIMM driver does not support namespace operation on this > kind of PMEM, apply below changes to support dynamical namespace: > > @@ -798,7 +823,8 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *a > continue; > } > > - if (nfit_mem->bdw && nfit_mem->memdev_pmem) > + //if (nfit_mem->bdw && nfit_mem->memdev_pmem) > + if (nfit_mem->memdev_pmem) > flags |= NDD_ALIASING; > > You can append another NVDIMM device in guest and do: > # cd /sys/bus/nd/devices/ > # cd namespace1.0/ > # echo `uuidgen` > uuid > # echo `expr 1024 \* 1024 \* 128` > size > then reload nd.pmem.ko > > You can see /dev/pmem1 appears > > ====== TODO ====== > NVDIMM hotplug support > > Xiao Guangrong (32): > acpi: add aml_derefof > acpi: add aml_sizeof > acpi: add aml_create_field > acpi: add aml_mutex, aml_acquire, aml_release > acpi: add aml_concatenate > acpi: add aml_object_type > util: introduce qemu_file_get_page_size() > exec: allow memory to be allocated from any kind of path > exec: allow file_ram_alloc to work on file > hostmem-file: clean up memory allocation > hostmem-file: use whole file size if possible > pc-dimm: remove DEFAULT_PC_DIMMSIZE > pc-dimm: make pc_existing_dimms_capacity static and rename it > pc-dimm: drop the prefix of pc-dimm > stubs: rename qmp_pc_dimm_device_list.c > pc-dimm: rename pc-dimm.c and pc-dimm.h > dimm: abstract dimm device from pc-dimm > dimm: get mapped memory region from DIMMDeviceClass->get_memory_region > dimm: keep the state of the whole backend memory > dimm: introduce realize callback > nvdimm: implement NVDIMM device abstract > nvdimm: init the address region used by NVDIMM ACPI > nvdimm: build ACPI NFIT table > nvdimm: init the address region used by DSM method > nvdimm: build ACPI nvdimm devices > nvdimm: save arg3 for NVDIMM device _DSM method > nvdimm: support DSM_CMD_IMPLEMENTED function > nvdimm: support DSM_CMD_NAMESPACE_LABEL_SIZE function > nvdimm: support DSM_CMD_GET_NAMESPACE_LABEL_DATA > nvdimm: support DSM_CMD_SET_NAMESPACE_LABEL_DATA > nvdimm: allow using whole backend memory as pmem > nvdimm: add maintain info > > MAINTAINERS | 6 + > backends/hostmem-file.c | 58 +- > default-configs/i386-softmmu.mak | 2 + > default-configs/x86_64-softmmu.mak | 2 + > exec.c | 113 ++- > hmp.c | 2 +- > hw/Makefile.objs | 2 +- > hw/acpi/aml-build.c | 83 ++ > hw/acpi/ich9.c | 8 +- > hw/acpi/memory_hotplug.c | 26 +- > hw/acpi/piix4.c | 8 +- > hw/i386/acpi-build.c | 4 + > hw/i386/pc.c | 37 +- > hw/mem/Makefile.objs | 3 + > hw/mem/{pc-dimm.c => dimm.c} | 240 ++--- > hw/mem/nvdimm/acpi.c | 961 +++++++++++++++++++++ > hw/mem/nvdimm/internal.h | 41 + > hw/mem/nvdimm/namespace.c | 309 +++++++ > hw/mem/nvdimm/nvdimm.c | 136 +++ > hw/mem/pc-dimm.c | 506 +---------- > hw/ppc/spapr.c | 20 +- > include/hw/acpi/aml-build.h | 8 + > include/hw/i386/pc.h | 4 +- > include/hw/mem/{pc-dimm.h => dimm.h} | 68 +- > include/hw/mem/nvdimm.h | 58 ++ > include/hw/mem/pc-dimm.h | 105 +-- > include/hw/ppc/spapr.h | 2 +- > include/qemu/osdep.h | 1 + > numa.c | 4 +- > qapi-schema.json | 8 +- > qmp.c | 4 +- > stubs/Makefile.objs | 2 +- > ...c_dimm_device_list.c => qmp_dimm_device_list.c} | 4 +- > target-ppc/kvm.c | 21 +- > trace-events | 8 +- > util/oslib-posix.c | 16 + > util/oslib-win32.c | 5 + > 37 files changed, 2023 insertions(+), 862 deletions(-) > rename hw/mem/{pc-dimm.c => dimm.c} (65%) > create mode 100644 hw/mem/nvdimm/acpi.c > create mode 100644 hw/mem/nvdimm/internal.h > create mode 100644 hw/mem/nvdimm/namespace.c > create mode 100644 hw/mem/nvdimm/nvdimm.c > rewrite hw/mem/pc-dimm.c (91%) > rename include/hw/mem/{pc-dimm.h => dimm.h} (50%) > create mode 100644 include/hw/mem/nvdimm.h > rewrite include/hw/mem/pc-dimm.h (97%) > rename stubs/{qmp_pc_dimm_device_list.c => qmp_dimm_device_list.c} (56%) > > -- > 1.8.3.1 > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- http://raobharata.wordpress.com/ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html