Re: [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 09/07/2015 10:11 PM, Igor Mammedov wrote:
On Fri, 14 Aug 2015 22:52:01 +0800
Xiao Guangrong <guangrong.xiao@xxxxxxxxxxxxxxx> wrote:

The parameter @file is used as backed memory for NVDIMM which is
divided into two parts if @dataconfig is true:
- first parts is (0, size - 128K], which is used as PMEM (Persistent
   Memory)
- 128K at the end of the file, which is used as Config Data Area, it's
   used to store Label namespace data

The @file supports both regular file and block device, of course we
can assign any these two kinds of files for test and emulation, however,
in the real word for performance reason, we usually used these files as
NVDIMM backed file:
- the regular file in the filesystem with DAX enabled created on NVDIMM
   device on host
- the raw PMEM device on host, e,g /dev/pmem0

A lot of code in this series could reuse what QEMU already
uses for implementing pc-dimm devices.

here is common concepts that could be reused.
   - on physical system both DIMM and NVDIMM devices use
     the same slots. We could share QEMU's '-m slots' option between
     both devices. An alternative to not sharing would be to introduce
     '-machine nvdimm_slots' option.
     And yes, we need to know number of NVDIMMs to describe
     them all in ACPI table (taking in amount future hotplug
     include in this possible NVDIMM devices)
     I'd go the same way as on real hardware on make them share the same slots.

I'd prefer sharing slots for pc-dimm and nvdimm, it's easier to reuse the
logic of slot-assignment and plug/unplug.

   - they share the same physical address space and limits
     on how much memory system can handle. So I'd suggest sharing existing
     '-m maxmem' option and reuse hotplug_memory address space.

Sounds good to me.


Essentially what I'm suggesting is to inherit NVDIMM's implementation
from pc-dimm reusing all of its code/backends and
just override parts that do memory mapping into guest's address space to
accommodate NVDIMM's requirements.

Good idea!

We have to differentiate pc-dimm and nvdimm in the common code and nvdimm
has different points with pc-dimm (for example, its has reserved-region, and
need support live migration of label data). How about rename 'pc-nvdimm' to
'memory-device' and make it as a common device type, then build pc-dimm and
nvdimm on top of it?

Something like:
static TypeInfo memory_device_info = {
    .name          = TYPE_MEM_DEV,
    .parent        = TYPE_DEVICE,
};

static TypeInfo memory_device_info = {
.name = TYPE_PC_DIMM,
.parent = TYPE_MEM_DEV,
};

static TypeInfo memory_device_info = {
.name = TYPE_NVDIMM,
.parent = TYPE_MEM_DEV,
};

It also make CONIFG_NVDIMM and CONFIG_HOT_PLUG be independent.



Signed-off-by: Xiao Guangrong <guangrong.xiao@xxxxxxxxxxxxxxx>
---
  hw/mem/nvdimm/pc-nvdimm.c  | 109 ++++++++++++++++++++++++++++++++++++++++++++-
  include/hw/mem/pc-nvdimm.h |   7 +++
  2 files changed, 115 insertions(+), 1 deletion(-)

diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
index 7a270a8..97710d1 100644
--- a/hw/mem/nvdimm/pc-nvdimm.c
+++ b/hw/mem/nvdimm/pc-nvdimm.c
@@ -22,12 +22,20 @@
   * License along with this library; if not, see <http://www.gnu.org/licenses/>
   */

+#include <sys/mman.h>
+#include <sys/ioctl.h>
+#include <linux/fs.h>
+
+#include "exec/address-spaces.h"
  #include "hw/mem/pc-nvdimm.h"

-#define PAGE_SIZE      (1UL << 12)
+#define PAGE_SIZE               (1UL << 12)
+
+#define MIN_CONFIG_DATA_SIZE    (128 << 10)

  static struct nvdimms_info {
      ram_addr_t current_addr;
+    int device_index;
  } nvdimms_info;

  /* the address range [offset, ~0ULL) is reserved for NVDIMM. */
@@ -37,6 +45,26 @@ void pc_nvdimm_reserve_range(ram_addr_t offset)
      nvdimms_info.current_addr = offset;
  }

+static ram_addr_t reserved_range_push(uint64_t size)
+{
+    uint64_t current;
+
+    current = ROUND_UP(nvdimms_info.current_addr, PAGE_SIZE);
+
+    /* do not have enough space? */
+    if (current + size < current) {
+        return 0;
+    }
+
+    nvdimms_info.current_addr = current + size;
+    return current;
+}
You can't use all memory above hotplug_memory area since
we have to tell guest where 64-bit PCI window starts,
and currently it should start at reserved-memory-end
(but it isn't due to a bug: I've just posted fix to qemu-devel
  "[PATCH 0/2] pc: fix 64-bit PCI window clashing with memory hotplug region"
)

Ah, got it, thanks for you pointing it out.


+
+static uint32_t new_device_index(void)
+{
+    return nvdimms_info.device_index++;
+}
+
  static char *get_file(Object *obj, Error **errp)
  {
      PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
@@ -48,6 +76,11 @@ static void set_file(Object *obj, const char *str, Error **errp)
  {
      PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);

+    if (memory_region_size(&nvdimm->mr)) {
+        error_setg(errp, "cannot change property value");
+        return;
+    }
+
      if (nvdimm->file) {
          g_free(nvdimm->file);
      }
@@ -76,13 +109,87 @@ static void pc_nvdimm_init(Object *obj)
                               set_configdata, NULL);
  }

+static uint64_t get_file_size(int fd)
+{
+    struct stat stat_buf;
+    uint64_t size;
+
+    if (fstat(fd, &stat_buf) < 0) {
+        return 0;
+    }
+
+    if (S_ISREG(stat_buf.st_mode)) {
+        return stat_buf.st_size;
+    }
+
+    if (S_ISBLK(stat_buf.st_mode) && !ioctl(fd, BLKGETSIZE64, &size)) {
+        return size;
+    }
+
+    return 0;
+}
All this file stuff I'd leave to already existing backends like
memory-backend-file or even memory-backend-ram which already do
above and more allowing to configure persistent and volatile
NVDIMMs without changing NVDIMM fronted code.


The current memory backends use all memory size and map it to guest's
address space. However, nvdimm needs a reserved region for its label
data which is only accessed in Qemu.

How about introduce two parameters, "reserved_size" and "reserved_addr"
to TYPE_MEMORY_BACKEND, then only the memory region
[0, size - reserved_size) is mapped to guest and the remain part is
pointed by "reserved_addr"?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux