Re: [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area

Igor Mammedov <imammedo@xxxxxxxxxx> · Thu, 10 Sep 2015 12:35:13 +0200



On Tue, 8 Sep 2015 21:38:17 +0800
Xiao Guangrong <guangrong.xiao@xxxxxxxxxxxxxxx> wrote:

> 
> 
> On 09/07/2015 10:11 PM, Igor Mammedov wrote:
> > On Fri, 14 Aug 2015 22:52:01 +0800
> > Xiao Guangrong <guangrong.xiao@xxxxxxxxxxxxxxx> wrote:
> >
> >> The parameter @file is used as backed memory for NVDIMM which is
> >> divided into two parts if @dataconfig is true:
> >> - first parts is (0, size - 128K], which is used as PMEM (Persistent
> >>    Memory)
> >> - 128K at the end of the file, which is used as Config Data Area, it's
> >>    used to store Label namespace data
> >>
> >> The @file supports both regular file and block device, of course we
> >> can assign any these two kinds of files for test and emulation, however,
> >> in the real word for performance reason, we usually used these files as
> >> NVDIMM backed file:
> >> - the regular file in the filesystem with DAX enabled created on NVDIMM
> >>    device on host
> >> - the raw PMEM device on host, e,g /dev/pmem0
> >
> > A lot of code in this series could reuse what QEMU already
> > uses for implementing pc-dimm devices.
> >
> > here is common concepts that could be reused.
> >    - on physical system both DIMM and NVDIMM devices use
> >      the same slots. We could share QEMU's '-m slots' option between
> >      both devices. An alternative to not sharing would be to introduce
> >      '-machine nvdimm_slots' option.
> >      And yes, we need to know number of NVDIMMs to describe
> >      them all in ACPI table (taking in amount future hotplug
> >      include in this possible NVDIMM devices)
> >      I'd go the same way as on real hardware on make them share the same slots.
> 
> I'd prefer sharing slots for pc-dimm and nvdimm, it's easier to reuse the
> logic of slot-assignment and plug/unplug.
> 
> >    - they share the same physical address space and limits
> >      on how much memory system can handle. So I'd suggest sharing existing
> >      '-m maxmem' option and reuse hotplug_memory address space.
> 
> Sounds good to me.
> 
> >
> > Essentially what I'm suggesting is to inherit NVDIMM's implementation
> > from pc-dimm reusing all of its code/backends and
> > just override parts that do memory mapping into guest's address space to
> > accommodate NVDIMM's requirements.
> 
> Good idea!
> 
> We have to differentiate pc-dimm and nvdimm in the common code and nvdimm
> has different points with pc-dimm (for example, its has reserved-region, and
> need support live migration of label data). How about rename 'pc-nvdimm' to
> 'memory-device' and make it as a common device type, then build pc-dimm and
> nvdimm on top of it?
sound good, only I'd call it just 'dimm' as 'memory-device' is too broad.
Also I'd make base class abstract.

> 
> Something like:
> static TypeInfo memory_device_info = {
>      .name          = TYPE_MEM_DEV,
>      .parent        = TYPE_DEVICE,
> };
> 
> static TypeInfo memory_device_info = {
> .name = TYPE_PC_DIMM,
> .parent = TYPE_MEM_DEV,
> };
> 
> static TypeInfo memory_device_info = {
> .name = TYPE_NVDIMM,
> .parent = TYPE_MEM_DEV,
> };
> 
> It also make CONIFG_NVDIMM and CONFIG_HOT_PLUG be independent.
> 
> >
> >>
> >> Signed-off-by: Xiao Guangrong <guangrong.xiao@xxxxxxxxxxxxxxx>
> >> ---
> >>   hw/mem/nvdimm/pc-nvdimm.c  | 109 ++++++++++++++++++++++++++++++++++++++++++++-
> >>   include/hw/mem/pc-nvdimm.h |   7 +++
> >>   2 files changed, 115 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
> >> index 7a270a8..97710d1 100644
> >> --- a/hw/mem/nvdimm/pc-nvdimm.c
> >> +++ b/hw/mem/nvdimm/pc-nvdimm.c
> >> @@ -22,12 +22,20 @@
> >>    * License along with this library; if not, see <http://www.gnu.org/licenses/>
> >>    */
> >>
> >> +#include <sys/mman.h>
> >> +#include <sys/ioctl.h>
> >> +#include <linux/fs.h>
> >> +
> >> +#include "exec/address-spaces.h"
> >>   #include "hw/mem/pc-nvdimm.h"
> >>
> >> -#define PAGE_SIZE      (1UL << 12)
> >> +#define PAGE_SIZE               (1UL << 12)
> >> +
> >> +#define MIN_CONFIG_DATA_SIZE    (128 << 10)
> >>
> >>   static struct nvdimms_info {
> >>       ram_addr_t current_addr;
> >> +    int device_index;
> >>   } nvdimms_info;
> >>
> >>   /* the address range [offset, ~0ULL) is reserved for NVDIMM. */
> >> @@ -37,6 +45,26 @@ void pc_nvdimm_reserve_range(ram_addr_t offset)
> >>       nvdimms_info.current_addr = offset;
> >>   }
> >>
> >> +static ram_addr_t reserved_range_push(uint64_t size)
> >> +{
> >> +    uint64_t current;
> >> +
> >> +    current = ROUND_UP(nvdimms_info.current_addr, PAGE_SIZE);
> >> +
> >> +    /* do not have enough space? */
> >> +    if (current + size < current) {
> >> +        return 0;
> >> +    }
> >> +
> >> +    nvdimms_info.current_addr = current + size;
> >> +    return current;
> >> +}
> > You can't use all memory above hotplug_memory area since
> > we have to tell guest where 64-bit PCI window starts,
> > and currently it should start at reserved-memory-end
> > (but it isn't due to a bug: I've just posted fix to qemu-devel
> >   "[PATCH 0/2] pc: fix 64-bit PCI window clashing with memory hotplug region"
> > )
> 
> Ah, got it, thanks for you pointing it out.
> 
> >
> >> +
> >> +static uint32_t new_device_index(void)
> >> +{
> >> +    return nvdimms_info.device_index++;
> >> +}
> >> +
> >>   static char *get_file(Object *obj, Error **errp)
> >>   {
> >>       PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
> >> @@ -48,6 +76,11 @@ static void set_file(Object *obj, const char *str, Error **errp)
> >>   {
> >>       PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
> >>
> >> +    if (memory_region_size(&nvdimm->mr)) {
> >> +        error_setg(errp, "cannot change property value");
> >> +        return;
> >> +    }
> >> +
> >>       if (nvdimm->file) {
> >>           g_free(nvdimm->file);
> >>       }
> >> @@ -76,13 +109,87 @@ static void pc_nvdimm_init(Object *obj)
> >>                                set_configdata, NULL);
> >>   }
> >>
> >> +static uint64_t get_file_size(int fd)
> >> +{
> >> +    struct stat stat_buf;
> >> +    uint64_t size;
> >> +
> >> +    if (fstat(fd, &stat_buf) < 0) {
> >> +        return 0;
> >> +    }
> >> +
> >> +    if (S_ISREG(stat_buf.st_mode)) {
> >> +        return stat_buf.st_size;
> >> +    }
> >> +
> >> +    if (S_ISBLK(stat_buf.st_mode) && !ioctl(fd, BLKGETSIZE64, &size)) {
> >> +        return size;
> >> +    }
> >> +
> >> +    return 0;
> >> +}
> > All this file stuff I'd leave to already existing backends like
> > memory-backend-file or even memory-backend-ram which already do
> > above and more allowing to configure persistent and volatile
> > NVDIMMs without changing NVDIMM fronted code.
> >
> 
> The current memory backends use all memory size and map it to guest's
> address space. However, nvdimm needs a reserved region for its label
> data which is only accessed in Qemu.
> 
> How about introduce two parameters, "reserved_size" and "reserved_addr"
> to TYPE_MEMORY_BACKEND, then only the memory region
> [0, size - reserved_size) is mapped to guest and the remain part is
> pointed by "reserved_addr"?
Looks like only "reserved_size" is sufficient if there aren't any plans
/reasons to keep labels area not at the end of NVDIMM.

Also keeping reserved area in the same backend => MemoryRegion should
automatically migrate it during live migration.
To separate and map guest visible part data part of backend's MemoryRegion
you could use memory_region_init_alias().
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html