Re: [RFC 07/18] vfio: Add base object for VFIOContainer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2022/4/29 14:29, David Gibson wrote:
On Thu, Apr 14, 2022 at 03:46:59AM -0700, Yi Liu wrote:
Qomify the VFIOContainer object which acts as a base class for a
container. This base class is derived into the legacy VFIO container
and later on, into the new iommufd based container.

You certainly need the abstraction, but I'm not sure QOM is the right
way to accomplish it in this case.  The QOM class of things is visible
to the user/config layer via QMP (and sometimes command line).  It
doesn't necessarily correspond to guest visible differences, but it
often does.
got it. btw. this series adds an iommufd option in below. do you think
it can suit the notion that QOM class mostly be visible to user/config?

https://lore.kernel.org/kvm/20220414104710.28534-19-yi.l.liu@xxxxxxxxx/

AIUI, the idea here is that the back end in use should be an
implementation detail which doesn't affect the interfaces outside the
vfio subsystem itself.  If that's the case QOM may not be a great
fit, even though you can probably make it work.

yes, currently, the implementation detail is just for vfio subsystem. so
if iommufd option doesn't make too much sense to have QOM for the
abstraciton, I may just add an abstraction within vfio as you suggested.

The base class implements generic code such as code related to
memory_listener and address space management whereas the derived
class implements callbacks that depend on the kernel user space
being used.

'as.c' only manipulates the base class object with wrapper functions
that call the right class functions. Existing 'container.c' code is
converted to implement the legacy container class functions.

Existing migration code only works with the legacy container.
Also 'spapr.c' isn't BE agnostic.

Below is the object. It's named as VFIOContainer, old VFIOContainer
is replaced with VFIOLegacyContainer.

struct VFIOContainer {
     /* private */
     Object parent_obj;

     VFIOAddressSpace *space;
     MemoryListener listener;
     Error *error;
     bool initialized;
     bool dirty_pages_supported;
     uint64_t dirty_pgsizes;
     uint64_t max_dirty_bitmap_size;
     unsigned long pgsizes;
     unsigned int dma_max_mappings;
     QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
     QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
     QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
     QLIST_ENTRY(VFIOContainer) next;
};

struct VFIOLegacyContainer {
     VFIOContainer obj;
     int fd; /* /dev/vfio/vfio, empowered by the attached groups */
     MemoryListener prereg_listener;
     unsigned iommu_type;
     QLIST_HEAD(, VFIOGroup) group_list;
};

Co-authored-by: Eric Auger <eric.auger@xxxxxxxxxx>
Signed-off-by: Eric Auger <eric.auger@xxxxxxxxxx>
Signed-off-by: Yi Liu <yi.l.liu@xxxxxxxxx>
---
  hw/vfio/as.c                         |  48 +++---
  hw/vfio/container-obj.c              | 195 +++++++++++++++++++++++
  hw/vfio/container.c                  | 224 ++++++++++++++++-----------
  hw/vfio/meson.build                  |   1 +
  hw/vfio/migration.c                  |   4 +-
  hw/vfio/pci.c                        |   4 +-
  hw/vfio/spapr.c                      |  22 +--
  include/hw/vfio/vfio-common.h        |  78 ++--------
  include/hw/vfio/vfio-container-obj.h | 154 ++++++++++++++++++
  9 files changed, 540 insertions(+), 190 deletions(-)
  create mode 100644 hw/vfio/container-obj.c
  create mode 100644 include/hw/vfio/vfio-container-obj.h

diff --git a/hw/vfio/as.c b/hw/vfio/as.c
index 4181182808..37423d2c89 100644
--- a/hw/vfio/as.c
+++ b/hw/vfio/as.c
@@ -215,9 +215,9 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
           * of vaddr will always be there, even if the memory object is
           * destroyed and its backing memory munmap-ed.
           */
-        ret = vfio_dma_map(container, iova,
-                           iotlb->addr_mask + 1, vaddr,
-                           read_only);
+        ret = vfio_container_dma_map(container, iova,
+                                     iotlb->addr_mask + 1, vaddr,
+                                     read_only);
          if (ret) {
              error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
                           "0x%"HWADDR_PRIx", %p) = %d (%m)",
@@ -225,7 +225,8 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
                           iotlb->addr_mask + 1, vaddr, ret);
          }
      } else {
-        ret = vfio_dma_unmap(container, iova, iotlb->addr_mask + 1, iotlb);
+        ret = vfio_container_dma_unmap(container, iova,
+                                       iotlb->addr_mask + 1, iotlb);
          if (ret) {
              error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
                           "0x%"HWADDR_PRIx") = %d (%m)",
@@ -242,12 +243,13 @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
  {
      VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
                                                  listener);
+    VFIOContainer *container = vrdl->container;
      const hwaddr size = int128_get64(section->size);
      const hwaddr iova = section->offset_within_address_space;
      int ret;
/* Unmap with a single call. */
-    ret = vfio_dma_unmap(vrdl->container, iova, size , NULL);
+    ret = vfio_container_dma_unmap(container, iova, size , NULL);
      if (ret) {
          error_report("%s: vfio_dma_unmap() failed: %s", __func__,
                       strerror(-ret));
@@ -259,6 +261,7 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
  {
      VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
                                                  listener);
+    VFIOContainer *container = vrdl->container;
      const hwaddr end = section->offset_within_region +
                         int128_get64(section->size);
      hwaddr start, next, iova;
@@ -277,8 +280,8 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
                 section->offset_within_address_space;
          vaddr = memory_region_get_ram_ptr(section->mr) + start;
- ret = vfio_dma_map(vrdl->container, iova, next - start,
-                           vaddr, section->readonly);
+        ret = vfio_container_dma_map(container, iova, next - start,
+                                     vaddr, section->readonly);
          if (ret) {
              /* Rollback */
              vfio_ram_discard_notify_discard(rdl, section);
@@ -530,8 +533,8 @@ static void vfio_listener_region_add(MemoryListener *listener,
          }
      }
- ret = vfio_dma_map(container, iova, int128_get64(llsize),
-                       vaddr, section->readonly);
+    ret = vfio_container_dma_map(container, iova, int128_get64(llsize),
+                                 vaddr, section->readonly);
      if (ret) {
          error_setg(&err, "vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
                     "0x%"HWADDR_PRIx", %p) = %d (%m)",
@@ -656,7 +659,8 @@ static void vfio_listener_region_del(MemoryListener *listener,
          if (int128_eq(llsize, int128_2_64())) {
              /* The unmap ioctl doesn't accept a full 64-bit span. */
              llsize = int128_rshift(llsize, 1);
-            ret = vfio_dma_unmap(container, iova, int128_get64(llsize), NULL);
+            ret = vfio_container_dma_unmap(container, iova,
+                                           int128_get64(llsize), NULL);
              if (ret) {
                  error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
                               "0x%"HWADDR_PRIx") = %d (%m)",
@@ -664,7 +668,8 @@ static void vfio_listener_region_del(MemoryListener *listener,
              }
              iova += int128_get64(llsize);
          }
-        ret = vfio_dma_unmap(container, iova, int128_get64(llsize), NULL);
+        ret = vfio_container_dma_unmap(container, iova,
+                                       int128_get64(llsize), NULL);
          if (ret) {
              error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
                           "0x%"HWADDR_PRIx") = %d (%m)",
@@ -681,14 +686,14 @@ static void vfio_listener_log_global_start(MemoryListener *listener)
  {
      VFIOContainer *container = container_of(listener, VFIOContainer, listener);
- vfio_set_dirty_page_tracking(container, true);
+    vfio_container_set_dirty_page_tracking(container, true);
  }
static void vfio_listener_log_global_stop(MemoryListener *listener)
  {
      VFIOContainer *container = container_of(listener, VFIOContainer, listener);
- vfio_set_dirty_page_tracking(container, false);
+    vfio_container_set_dirty_page_tracking(container, false);
  }
typedef struct {
@@ -717,8 +722,9 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
      if (vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) {
          int ret;
- ret = vfio_get_dirty_bitmap(container, iova, iotlb->addr_mask + 1,
-                                    translated_addr);
+        ret = vfio_container_get_dirty_bitmap(container, iova,
+                                              iotlb->addr_mask + 1,
+                                              translated_addr);
          if (ret) {
              error_report("vfio_iommu_map_dirty_notify(%p, 0x%"HWADDR_PRIx", "
                           "0x%"HWADDR_PRIx") = %d (%m)",
@@ -742,11 +748,13 @@ static int vfio_ram_discard_get_dirty_bitmap(MemoryRegionSection *section,
       * Sync the whole mapped region (spanning multiple individual mappings)
       * in one go.
       */
-    return vfio_get_dirty_bitmap(vrdl->container, iova, size, ram_addr);
+    return vfio_container_get_dirty_bitmap(vrdl->container, iova,
+                                           size, ram_addr);
  }
-static int vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainer *container,
-                                                   MemoryRegionSection *section)
+static int
+vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainer *container,
+                                            MemoryRegionSection *section)
  {
      RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
      VFIORamDiscardListener *vrdl = NULL;
@@ -810,7 +818,7 @@ static int vfio_sync_dirty_bitmap(VFIOContainer *container,
      ram_addr = memory_region_get_ram_addr(section->mr) +
                 section->offset_within_region;
- return vfio_get_dirty_bitmap(container,
+    return vfio_container_get_dirty_bitmap(container,
                     REAL_HOST_PAGE_ALIGN(section->offset_within_address_space),
                     int128_get64(section->size), ram_addr);
  }
@@ -825,7 +833,7 @@ static void vfio_listener_log_sync(MemoryListener *listener,
          return;
      }
- if (vfio_devices_all_dirty_tracking(container)) {
+    if (vfio_container_devices_all_dirty_tracking(container)) {
          vfio_sync_dirty_bitmap(container, section);
      }
  }
diff --git a/hw/vfio/container-obj.c b/hw/vfio/container-obj.c
new file mode 100644
index 0000000000..40c1e2a2b5
--- /dev/null
+++ b/hw/vfio/container-obj.c
@@ -0,0 +1,195 @@
+/*
+ * VFIO CONTAINER BASE OBJECT
+ *
+ * Copyright (C) 2022 Intel Corporation.
+ * Copyright Red Hat, Inc. 2022
+ *
+ * Authors: Yi Liu <yi.l.liu@xxxxxxxxx>
+ *          Eric Auger <eric.auger@xxxxxxxxxx>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "qom/object.h"
+#include "qapi/visitor.h"
+#include "hw/vfio/vfio-container-obj.h"
+
+bool vfio_container_check_extension(VFIOContainer *container,
+                                    VFIOContainerFeature feat)
+{
+    VFIOContainerClass *vccs = VFIO_CONTAINER_OBJ_GET_CLASS(container);
+
+    if (!vccs->check_extension) {
+        return false;
+    }
+
+    return vccs->check_extension(container, feat);
+}
+
+int vfio_container_dma_map(VFIOContainer *container,
+                           hwaddr iova, ram_addr_t size,
+                           void *vaddr, bool readonly)
+{
+    VFIOContainerClass *vccs = VFIO_CONTAINER_OBJ_GET_CLASS(container);
+
+    if (!vccs->dma_map) {
+        return -EINVAL;
+    }
+
+    return vccs->dma_map(container, iova, size, vaddr, readonly);
+}
+
+int vfio_container_dma_unmap(VFIOContainer *container,
+                             hwaddr iova, ram_addr_t size,
+                             IOMMUTLBEntry *iotlb)
+{
+    VFIOContainerClass *vccs = VFIO_CONTAINER_OBJ_GET_CLASS(container);
+
+    vccs = VFIO_CONTAINER_OBJ_GET_CLASS(container);
+
+    if (!vccs->dma_unmap) {
+        return -EINVAL;
+    }
+
+    return vccs->dma_unmap(container, iova, size, iotlb);
+}
+
+void vfio_container_set_dirty_page_tracking(VFIOContainer *container,
+                                            bool start)
+{
+    VFIOContainerClass *vccs = VFIO_CONTAINER_OBJ_GET_CLASS(container);
+
+    if (!vccs->set_dirty_page_tracking) {
+        return;
+    }
+
+    vccs->set_dirty_page_tracking(container, start);
+}
+
+bool vfio_container_devices_all_dirty_tracking(VFIOContainer *container)
+{
+    VFIOContainerClass *vccs = VFIO_CONTAINER_OBJ_GET_CLASS(container);
+
+    if (!vccs->devices_all_dirty_tracking) {
+        return false;
+    }
+
+    return vccs->devices_all_dirty_tracking(container);
+}
+
+int vfio_container_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
+                                    uint64_t size, ram_addr_t ram_addr)
+{
+    VFIOContainerClass *vccs = VFIO_CONTAINER_OBJ_GET_CLASS(container);
+
+    if (!vccs->get_dirty_bitmap) {
+        return -EINVAL;
+    }
+
+    return vccs->get_dirty_bitmap(container, iova, size, ram_addr);
+}
+
+int vfio_container_add_section_window(VFIOContainer *container,
+                                      MemoryRegionSection *section,
+                                      Error **errp)
+{
+    VFIOContainerClass *vccs = VFIO_CONTAINER_OBJ_GET_CLASS(container);
+
+    if (!vccs->add_window) {
+        return 0;
+    }
+
+    return vccs->add_window(container, section, errp);
+}
+
+void vfio_container_del_section_window(VFIOContainer *container,
+                                       MemoryRegionSection *section)
+{
+    VFIOContainerClass *vccs = VFIO_CONTAINER_OBJ_GET_CLASS(container);
+
+    if (!vccs->del_window) {
+        return;
+    }
+
+    return vccs->del_window(container, section);
+}
+
+void vfio_container_init(void *_container, size_t instance_size,
+                         const char *mrtypename,
+                         VFIOAddressSpace *space)
+{
+    VFIOContainer *container;
+
+    object_initialize(_container, instance_size, mrtypename);
+    container = VFIO_CONTAINER_OBJ(_container);
+
+    container->space = space;
+    container->error = NULL;
+    container->dirty_pages_supported = false;
+    container->dma_max_mappings = 0;
+    QLIST_INIT(&container->giommu_list);
+    QLIST_INIT(&container->hostwin_list);
+    QLIST_INIT(&container->vrdl_list);
+}
+
+void vfio_container_destroy(VFIOContainer *container)
+{
+    VFIORamDiscardListener *vrdl, *vrdl_tmp;
+    VFIOGuestIOMMU *giommu, *tmp;
+    VFIOHostDMAWindow *hostwin, *next;
+
+    QLIST_SAFE_REMOVE(container, next);
+
+    QLIST_FOREACH_SAFE(vrdl, &container->vrdl_list, next, vrdl_tmp) {
+        RamDiscardManager *rdm;
+
+        rdm = memory_region_get_ram_discard_manager(vrdl->mr);
+        ram_discard_manager_unregister_listener(rdm, &vrdl->listener);
+        QLIST_REMOVE(vrdl, next);
+        g_free(vrdl);
+    }
+
+    QLIST_FOREACH_SAFE(giommu, &container->giommu_list, giommu_next, tmp) {
+        memory_region_unregister_iommu_notifier(
+                MEMORY_REGION(giommu->iommu_mr), &giommu->n);
+        QLIST_REMOVE(giommu, giommu_next);
+        g_free(giommu);
+    }
+
+    QLIST_FOREACH_SAFE(hostwin, &container->hostwin_list, hostwin_next,
+                       next) {
+        QLIST_REMOVE(hostwin, hostwin_next);
+        g_free(hostwin);
+    }
+
+    object_unref(&container->parent_obj);
+}
+
+static const TypeInfo vfio_container_info = {
+    .parent             = TYPE_OBJECT,
+    .name               = TYPE_VFIO_CONTAINER_OBJ,
+    .class_size         = sizeof(VFIOContainerClass),
+    .instance_size      = sizeof(VFIOContainer),
+    .abstract           = true,
+};
+
+static void vfio_container_register_types(void)
+{
+    type_register_static(&vfio_container_info);
+}
+
+type_init(vfio_container_register_types)
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 9c665c1720..79972064d3 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -50,6 +50,8 @@
  static int vfio_kvm_device_fd = -1;
  #endif
+#define TYPE_VFIO_LEGACY_CONTAINER "qemu:vfio-legacy-container"
+
  VFIOGroupList vfio_group_list =
      QLIST_HEAD_INITIALIZER(vfio_group_list);
@@ -76,8 +78,10 @@ bool vfio_mig_active(void)
      return true;
  }
-bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
+static bool vfio_devices_all_dirty_tracking(VFIOContainer *bcontainer)
  {
+    VFIOLegacyContainer *container = container_of(bcontainer,
+                                                  VFIOLegacyContainer, obj);
      VFIOGroup *group;
      VFIODevice *vbasedev;
      MigrationState *ms = migrate_get_current();
@@ -103,7 +107,7 @@ bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
      return true;
  }
-bool vfio_devices_all_running_and_saving(VFIOContainer *container)
+static bool vfio_devices_all_running_and_saving(VFIOLegacyContainer *container)
  {
      VFIOGroup *group;
      VFIODevice *vbasedev;
@@ -132,10 +136,11 @@ bool vfio_devices_all_running_and_saving(VFIOContainer *container)
      return true;
  }
-static int vfio_dma_unmap_bitmap(VFIOContainer *container,
+static int vfio_dma_unmap_bitmap(VFIOLegacyContainer *container,
                                   hwaddr iova, ram_addr_t size,
                                   IOMMUTLBEntry *iotlb)
  {
+    VFIOContainer *bcontainer = &container->obj;
      struct vfio_iommu_type1_dma_unmap *unmap;
      struct vfio_bitmap *bitmap;
      uint64_t pages = REAL_HOST_PAGE_ALIGN(size) / qemu_real_host_page_size;
@@ -159,7 +164,7 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container,
      bitmap->size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) /
                     BITS_PER_BYTE;
- if (bitmap->size > container->max_dirty_bitmap_size) {
+    if (bitmap->size > bcontainer->max_dirty_bitmap_size) {
          error_report("UNMAP: Size of bitmap too big 0x%"PRIx64,
                       (uint64_t)bitmap->size);
          ret = -E2BIG;
@@ -189,10 +194,12 @@ unmap_exit:
  /*
   * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
   */
-int vfio_dma_unmap(VFIOContainer *container,
-                   hwaddr iova, ram_addr_t size,
-                   IOMMUTLBEntry *iotlb)
+static int vfio_dma_unmap(VFIOContainer *bcontainer,
+                          hwaddr iova, ram_addr_t size,
+                          IOMMUTLBEntry *iotlb)
  {
+    VFIOLegacyContainer *container = container_of(bcontainer,
+                                                  VFIOLegacyContainer, obj);
      struct vfio_iommu_type1_dma_unmap unmap = {
          .argsz = sizeof(unmap),
          .flags = 0,
@@ -200,7 +207,7 @@ int vfio_dma_unmap(VFIOContainer *container,
          .size = size,
      };
- if (iotlb && container->dirty_pages_supported &&
+    if (iotlb && bcontainer->dirty_pages_supported &&
          vfio_devices_all_running_and_saving(container)) {
          return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
      }
@@ -221,7 +228,7 @@ int vfio_dma_unmap(VFIOContainer *container,
          if (errno == EINVAL && unmap.size && !(unmap.iova + unmap.size) &&
              container->iommu_type == VFIO_TYPE1v2_IOMMU) {
              trace_vfio_dma_unmap_overflow_workaround();
-            unmap.size -= 1ULL << ctz64(container->pgsizes);
+            unmap.size -= 1ULL << ctz64(bcontainer->pgsizes);
              continue;
          }
          error_report("VFIO_UNMAP_DMA failed: %s", strerror(errno));
@@ -231,9 +238,22 @@ int vfio_dma_unmap(VFIOContainer *container,
      return 0;
  }
-int vfio_dma_map(VFIOContainer *container, hwaddr iova,
-                 ram_addr_t size, void *vaddr, bool readonly)
+static bool vfio_legacy_container_check_extension(VFIOContainer *bcontainer,
+                                                  VFIOContainerFeature feat)
  {
+    switch (feat) {
+    case VFIO_FEAT_LIVE_MIGRATION:
+        return true;
+    default:
+        return false;
+    };
+}
+
+static int vfio_dma_map(VFIOContainer *bcontainer, hwaddr iova,
+                       ram_addr_t size, void *vaddr, bool readonly)
+{
+    VFIOLegacyContainer *container = container_of(bcontainer,
+                                                  VFIOLegacyContainer, obj);
      struct vfio_iommu_type1_dma_map map = {
          .argsz = sizeof(map),
          .flags = VFIO_DMA_MAP_FLAG_READ,
@@ -252,7 +272,7 @@ int vfio_dma_map(VFIOContainer *container, hwaddr iova,
       * the VGA ROM space.
       */
      if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0 ||
-        (errno == EBUSY && vfio_dma_unmap(container, iova, size, NULL) == 0 &&
+        (errno == EBUSY && vfio_dma_unmap(bcontainer, iova, size, NULL) == 0 &&
           ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0)) {
          return 0;
      }
@@ -261,8 +281,10 @@ int vfio_dma_map(VFIOContainer *container, hwaddr iova,
      return -errno;
  }
-void vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
+static void vfio_set_dirty_page_tracking(VFIOContainer *bcontainer, bool start)
  {
+    VFIOLegacyContainer *container = container_of(bcontainer,
+                                                  VFIOLegacyContainer, obj);
      int ret;
      struct vfio_iommu_type1_dirty_bitmap dirty = {
          .argsz = sizeof(dirty),
@@ -281,9 +303,11 @@ void vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
      }
  }
-int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
-                          uint64_t size, ram_addr_t ram_addr)
+static int vfio_get_dirty_bitmap(VFIOContainer *bcontainer, uint64_t iova,
+                                 uint64_t size, ram_addr_t ram_addr)
  {
+    VFIOLegacyContainer *container = container_of(bcontainer,
+                                                  VFIOLegacyContainer, obj);
      struct vfio_iommu_type1_dirty_bitmap *dbitmap;
      struct vfio_iommu_type1_dirty_bitmap_get *range;
      uint64_t pages;
@@ -333,18 +357,23 @@ err_out:
      return ret;
  }
-static void vfio_listener_release(VFIOContainer *container)
+static void vfio_listener_release(VFIOLegacyContainer *container)
  {
-    memory_listener_unregister(&container->listener);
+    VFIOContainer *bcontainer = &container->obj;
+
+    memory_listener_unregister(&bcontainer->listener);
      if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
          memory_listener_unregister(&container->prereg_listener);
      }
  }
-int vfio_container_add_section_window(VFIOContainer *container,
-                                      MemoryRegionSection *section,
-                                      Error **errp)
+static int
+vfio_legacy_container_add_section_window(VFIOContainer *bcontainer,
+                                         MemoryRegionSection *section,
+                                         Error **errp)
  {
+    VFIOLegacyContainer *container = container_of(bcontainer,
+                                                  VFIOLegacyContainer, obj);
      VFIOHostDMAWindow *hostwin;
      hwaddr pgsize = 0;
      int ret;
@@ -354,7 +383,7 @@ int vfio_container_add_section_window(VFIOContainer *container,
      }
/* For now intersections are not allowed, we may relax this later */
-    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
+    QLIST_FOREACH(hostwin, &bcontainer->hostwin_list, hostwin_next) {
          if (ranges_overlap(hostwin->min_iova,
                             hostwin->max_iova - hostwin->min_iova + 1,
                             section->offset_within_address_space,
@@ -376,7 +405,7 @@ int vfio_container_add_section_window(VFIOContainer *container,
          return ret;
      }
- vfio_host_win_add(container, section->offset_within_address_space,
+    vfio_host_win_add(bcontainer, section->offset_within_address_space,
                        section->offset_within_address_space +
                        int128_get64(section->size) - 1, pgsize);
  #ifdef CONFIG_KVM
@@ -409,16 +438,20 @@ int vfio_container_add_section_window(VFIOContainer *container,
      return 0;
  }
-void vfio_container_del_section_window(VFIOContainer *container,
-                                       MemoryRegionSection *section)
+static void
+vfio_legacy_container_del_section_window(VFIOContainer *bcontainer,
+                                         MemoryRegionSection *section)
  {
+    VFIOLegacyContainer *container = container_of(bcontainer,
+                                                  VFIOLegacyContainer, obj);
+
      if (container->iommu_type != VFIO_SPAPR_TCE_v2_IOMMU) {
          return;
      }
vfio_spapr_remove_window(container,
                               section->offset_within_address_space);
-    if (vfio_host_win_del(container,
+    if (vfio_host_win_del(bcontainer,
                            section->offset_within_address_space,
                            section->offset_within_address_space +
                            int128_get64(section->size) - 1) < 0) {
@@ -505,7 +538,7 @@ static void vfio_kvm_device_del_group(VFIOGroup *group)
  /*
   * vfio_get_iommu_type - selects the richest iommu_type (v2 first)
   */
-static int vfio_get_iommu_type(VFIOContainer *container,
+static int vfio_get_iommu_type(VFIOLegacyContainer *container,
                                 Error **errp)
  {
      int iommu_types[] = { VFIO_TYPE1v2_IOMMU, VFIO_TYPE1_IOMMU,
@@ -521,7 +554,7 @@ static int vfio_get_iommu_type(VFIOContainer *container,
      return -EINVAL;
  }
-static int vfio_init_container(VFIOContainer *container, int group_fd,
+static int vfio_init_container(VFIOLegacyContainer *container, int group_fd,
                                 Error **errp)
  {
      int iommu_type, ret;
@@ -556,7 +589,7 @@ static int vfio_init_container(VFIOContainer *container, int group_fd,
      return 0;
  }
-static int vfio_get_iommu_info(VFIOContainer *container,
+static int vfio_get_iommu_info(VFIOLegacyContainer *container,
                                 struct vfio_iommu_type1_info **info)
  {
@@ -600,11 +633,12 @@ vfio_get_iommu_info_cap(struct vfio_iommu_type1_info *info, uint16_t id)
      return NULL;
  }
-static void vfio_get_iommu_info_migration(VFIOContainer *container,
-                                         struct vfio_iommu_type1_info *info)
+static void vfio_get_iommu_info_migration(VFIOLegacyContainer *container,
+                                          struct vfio_iommu_type1_info *info)
  {
      struct vfio_info_cap_header *hdr;
      struct vfio_iommu_type1_info_cap_migration *cap_mig;
+    VFIOContainer *bcontainer = &container->obj;
hdr = vfio_get_iommu_info_cap(info, VFIO_IOMMU_TYPE1_INFO_CAP_MIGRATION);
      if (!hdr) {
@@ -619,13 +653,14 @@ static void vfio_get_iommu_info_migration(VFIOContainer *container,
       * qemu_real_host_page_size to mark those dirty.
       */
      if (cap_mig->pgsize_bitmap & qemu_real_host_page_size) {
-        container->dirty_pages_supported = true;
-        container->max_dirty_bitmap_size = cap_mig->max_dirty_bitmap_size;
-        container->dirty_pgsizes = cap_mig->pgsize_bitmap;
+        bcontainer->dirty_pages_supported = true;
+        bcontainer->max_dirty_bitmap_size = cap_mig->max_dirty_bitmap_size;
+        bcontainer->dirty_pgsizes = cap_mig->pgsize_bitmap;
      }
  }
-static int vfio_ram_block_discard_disable(VFIOContainer *container, bool state)
+static int
+vfio_ram_block_discard_disable(VFIOLegacyContainer *container, bool state)
  {
      switch (container->iommu_type) {
      case VFIO_TYPE1v2_IOMMU:
@@ -651,7 +686,8 @@ static int vfio_ram_block_discard_disable(VFIOContainer *container, bool state)
  static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
                                    Error **errp)
  {
-    VFIOContainer *container;
+    VFIOContainer *bcontainer;
+    VFIOLegacyContainer *container;
      int ret, fd;
      VFIOAddressSpace *space;
@@ -688,7 +724,8 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
       * details once we know which type of IOMMU we are using.
       */
- QLIST_FOREACH(container, &space->containers, next) {
+    QLIST_FOREACH(bcontainer, &space->containers, next) {
+        container = container_of(bcontainer, VFIOLegacyContainer, obj);
          if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
              ret = vfio_ram_block_discard_disable(container, true);
              if (ret) {
@@ -724,14 +761,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
      }
container = g_malloc0(sizeof(*container));
-    container->space = space;
      container->fd = fd;
-    container->error = NULL;
-    container->dirty_pages_supported = false;
-    container->dma_max_mappings = 0;
-    QLIST_INIT(&container->giommu_list);
-    QLIST_INIT(&container->hostwin_list);
-    QLIST_INIT(&container->vrdl_list);
+    bcontainer = &container->obj;
+    vfio_container_init(bcontainer, sizeof(*bcontainer),
+                        TYPE_VFIO_LEGACY_CONTAINER, space);
ret = vfio_init_container(container, group->fd, errp);
      if (ret) {
@@ -763,13 +796,13 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
              /* Assume 4k IOVA page size */
              info->iova_pgsizes = 4096;
          }
-        vfio_host_win_add(container, 0, (hwaddr)-1, info->iova_pgsizes);
-        container->pgsizes = info->iova_pgsizes;
+        vfio_host_win_add(bcontainer, 0, (hwaddr)-1, info->iova_pgsizes);
+        bcontainer->pgsizes = info->iova_pgsizes;
/* The default in the kernel ("dma_entry_limit") is 65535. */
-        container->dma_max_mappings = 65535;
+        bcontainer->dma_max_mappings = 65535;
          if (!ret) {
-            vfio_get_info_dma_avail(info, &container->dma_max_mappings);
+            vfio_get_info_dma_avail(info, &bcontainer->dma_max_mappings);
              vfio_get_iommu_info_migration(container, info);
          }
          g_free(info);
@@ -798,10 +831,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
memory_listener_register(&container->prereg_listener,
                                       &address_space_memory);
-            if (container->error) {
+            if (bcontainer->error) {
                  memory_listener_unregister(&container->prereg_listener);
                  ret = -1;
-                error_propagate_prepend(errp, container->error,
+                error_propagate_prepend(errp, bcontainer->error,
                      "RAM memory listener initialization failed: ");
                  goto enable_discards_exit;
              }
@@ -820,7 +853,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
          }
if (v2) {
-            container->pgsizes = info.ddw.pgsizes;
+            bcontainer->pgsizes = info.ddw.pgsizes;
              /*
               * There is a default window in just created container.
               * To make region_add/del simpler, we better remove this
@@ -835,8 +868,8 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
              }
          } else {
              /* The default table uses 4K pages */
-            container->pgsizes = 0x1000;
-            vfio_host_win_add(container, info.dma32_window_start,
+            bcontainer->pgsizes = 0x1000;
+            vfio_host_win_add(bcontainer, info.dma32_window_start,
                                info.dma32_window_start +
                                info.dma32_window_size - 1,
                                0x1000);
@@ -847,28 +880,28 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
      vfio_kvm_device_add_group(group);
QLIST_INIT(&container->group_list);
-    QLIST_INSERT_HEAD(&space->containers, container, next);
+    QLIST_INSERT_HEAD(&space->containers, bcontainer, next);
group->container = container;
      QLIST_INSERT_HEAD(&container->group_list, group, container_next);
- container->listener = vfio_memory_listener;
+    bcontainer->listener = vfio_memory_listener;
- memory_listener_register(&container->listener, container->space->as);
+    memory_listener_register(&bcontainer->listener, bcontainer->space->as);
- if (container->error) {
+    if (bcontainer->error) {
          ret = -1;
-        error_propagate_prepend(errp, container->error,
+        error_propagate_prepend(errp, bcontainer->error,
              "memory listener initialization failed: ");
          goto listener_release_exit;
      }
- container->initialized = true;
+    bcontainer->initialized = true;
return 0;
  listener_release_exit:
      QLIST_REMOVE(group, container_next);
-    QLIST_REMOVE(container, next);
+    QLIST_REMOVE(bcontainer, next);
      vfio_kvm_device_del_group(group);
      vfio_listener_release(container);
@@ -889,7 +922,8 @@ put_space_exit: static void vfio_disconnect_container(VFIOGroup *group)
  {
-    VFIOContainer *container = group->container;
+    VFIOLegacyContainer *container = group->container;
+    VFIOContainer *bcontainer = &container->obj;
QLIST_REMOVE(group, container_next);
      group->container = NULL;
@@ -909,25 +943,9 @@ static void vfio_disconnect_container(VFIOGroup *group)
      }
if (QLIST_EMPTY(&container->group_list)) {
-        VFIOAddressSpace *space = container->space;
-        VFIOGuestIOMMU *giommu, *tmp;
-        VFIOHostDMAWindow *hostwin, *next;
-
-        QLIST_REMOVE(container, next);
-
-        QLIST_FOREACH_SAFE(giommu, &container->giommu_list, giommu_next, tmp) {
-            memory_region_unregister_iommu_notifier(
-                    MEMORY_REGION(giommu->iommu_mr), &giommu->n);
-            QLIST_REMOVE(giommu, giommu_next);
-            g_free(giommu);
-        }
-
-        QLIST_FOREACH_SAFE(hostwin, &container->hostwin_list, hostwin_next,
-                           next) {
-            QLIST_REMOVE(hostwin, hostwin_next);
-            g_free(hostwin);
-        }
+        VFIOAddressSpace *space = bcontainer->space;
+ vfio_container_destroy(bcontainer);
          trace_vfio_disconnect_container(container->fd);
          close(container->fd);
          g_free(container);
@@ -939,13 +957,15 @@ static void vfio_disconnect_container(VFIOGroup *group)
  VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
  {
      VFIOGroup *group;
+    VFIOContainer *bcontainer;
      char path[32];
      struct vfio_group_status status = { .argsz = sizeof(status) };
QLIST_FOREACH(group, &vfio_group_list, next) {
          if (group->groupid == groupid) {
              /* Found it.  Now is it already in the right context? */
-            if (group->container->space->as == as) {
+            bcontainer = &group->container->obj;
+            if (bcontainer->space->as == as) {
                  return group;
              } else {
                  error_setg(errp, "group %d used in multiple address spaces",
@@ -1098,7 +1118,7 @@ void vfio_put_base_device(VFIODevice *vbasedev)
  /*
   * Interfaces for IBM EEH (Enhanced Error Handling)
   */
-static bool vfio_eeh_container_ok(VFIOContainer *container)
+static bool vfio_eeh_container_ok(VFIOLegacyContainer *container)
  {
      /*
       * As of 2016-03-04 (linux-4.5) the host kernel EEH/VFIO
@@ -1126,7 +1146,7 @@ static bool vfio_eeh_container_ok(VFIOContainer *container)
      return true;
  }
-static int vfio_eeh_container_op(VFIOContainer *container, uint32_t op)
+static int vfio_eeh_container_op(VFIOLegacyContainer *container, uint32_t op)
  {
      struct vfio_eeh_pe_op pe_op = {
          .argsz = sizeof(pe_op),
@@ -1149,19 +1169,21 @@ static int vfio_eeh_container_op(VFIOContainer *container, uint32_t op)
      return ret;
  }
-static VFIOContainer *vfio_eeh_as_container(AddressSpace *as)
+static VFIOLegacyContainer *vfio_eeh_as_container(AddressSpace *as)
  {
      VFIOAddressSpace *space = vfio_get_address_space(as);
-    VFIOContainer *container = NULL;
+    VFIOLegacyContainer *container = NULL;
+    VFIOContainer *bcontainer = NULL;
if (QLIST_EMPTY(&space->containers)) {
          /* No containers to act on */
          goto out;
      }
- container = QLIST_FIRST(&space->containers);
+    bcontainer = QLIST_FIRST(&space->containers);
+    container = container_of(bcontainer, VFIOLegacyContainer, obj);
- if (QLIST_NEXT(container, next)) {
+    if (QLIST_NEXT(bcontainer, next)) {
          /*
           * We don't yet have logic to synchronize EEH state across
           * multiple containers.
@@ -1177,17 +1199,45 @@ out:
bool vfio_eeh_as_ok(AddressSpace *as)
  {
-    VFIOContainer *container = vfio_eeh_as_container(as);
+    VFIOLegacyContainer *container = vfio_eeh_as_container(as);
return (container != NULL) && vfio_eeh_container_ok(container);
  }
int vfio_eeh_as_op(AddressSpace *as, uint32_t op)
  {
-    VFIOContainer *container = vfio_eeh_as_container(as);
+    VFIOLegacyContainer *container = vfio_eeh_as_container(as);
if (!container) {
          return -ENODEV;
      }
      return vfio_eeh_container_op(container, op);
  }
+
+static void vfio_legacy_container_class_init(ObjectClass *klass,
+                                             void *data)
+{
+    VFIOContainerClass *vccs = VFIO_CONTAINER_OBJ_CLASS(klass);
+
+    vccs->dma_map = vfio_dma_map;
+    vccs->dma_unmap = vfio_dma_unmap;
+    vccs->devices_all_dirty_tracking = vfio_devices_all_dirty_tracking;
+    vccs->set_dirty_page_tracking = vfio_set_dirty_page_tracking;
+    vccs->get_dirty_bitmap = vfio_get_dirty_bitmap;
+    vccs->add_window = vfio_legacy_container_add_section_window;
+    vccs->del_window = vfio_legacy_container_del_section_window;
+    vccs->check_extension = vfio_legacy_container_check_extension;
+}
+
+static const TypeInfo vfio_legacy_container_info = {
+    .parent = TYPE_VFIO_CONTAINER_OBJ,
+    .name = TYPE_VFIO_LEGACY_CONTAINER,
+    .class_init = vfio_legacy_container_class_init,
+};
+
+static void vfio_register_types(void)
+{
+    type_register_static(&vfio_legacy_container_info);
+}
+
+type_init(vfio_register_types)
diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
index e3b6d6e2cb..df4fa2b695 100644
--- a/hw/vfio/meson.build
+++ b/hw/vfio/meson.build
@@ -2,6 +2,7 @@ vfio_ss = ss.source_set()
  vfio_ss.add(files(
    'common.c',
    'as.c',
+  'container-obj.c',
    'container.c',
    'spapr.c',
    'migration.c',
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index ff6b45de6b..cbbde177c3 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -856,11 +856,11 @@ int64_t vfio_mig_bytes_transferred(void)
int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
  {
-    VFIOContainer *container = vbasedev->group->container;
+    VFIOLegacyContainer *container = vbasedev->group->container;
      struct vfio_region_info *info = NULL;
      int ret = -ENOTSUP;
- if (!vbasedev->enable_migration || !container->dirty_pages_supported) {
+    if (!vbasedev->enable_migration || !container->obj.dirty_pages_supported) {
          goto add_blocker;
      }
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index e707329394..a00a485e46 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3101,7 +3101,9 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
          }
      }
- if (!pdev->failover_pair_id) {
+    if (!pdev->failover_pair_id &&
+        vfio_container_check_extension(&vbasedev->group->container->obj,
+                                       VFIO_FEAT_LIVE_MIGRATION)) {
          ret = vfio_migration_probe(vbasedev, errp);
          if (ret) {
              error_report("%s: Migration disabled", vbasedev->name);
diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
index 04c6e67f8f..cdcd9e05ba 100644
--- a/hw/vfio/spapr.c
+++ b/hw/vfio/spapr.c
@@ -39,8 +39,8 @@ static void *vfio_prereg_gpa_to_vaddr(MemoryRegionSection *section, hwaddr gpa)
  static void vfio_prereg_listener_region_add(MemoryListener *listener,
                                              MemoryRegionSection *section)
  {
-    VFIOContainer *container = container_of(listener, VFIOContainer,
-                                            prereg_listener);
+    VFIOLegacyContainer *container = container_of(listener, VFIOLegacyContainer,
+                                                  prereg_listener);
      const hwaddr gpa = section->offset_within_address_space;
      hwaddr end;
      int ret;
@@ -83,9 +83,9 @@ static void vfio_prereg_listener_region_add(MemoryListener *listener,
           * can gracefully fail.  Runtime, there's not much we can do other
           * than throw a hardware error.
           */
-        if (!container->initialized) {
-            if (!container->error) {
-                error_setg_errno(&container->error, -ret,
+        if (!container->obj.initialized) {
+            if (!container->obj.error) {
+                error_setg_errno(&container->obj.error, -ret,
                                   "Memory registering failed");
              }
          } else {
@@ -97,8 +97,8 @@ static void vfio_prereg_listener_region_add(MemoryListener *listener,
  static void vfio_prereg_listener_region_del(MemoryListener *listener,
                                              MemoryRegionSection *section)
  {
-    VFIOContainer *container = container_of(listener, VFIOContainer,
-                                            prereg_listener);
+    VFIOLegacyContainer *container = container_of(listener, VFIOLegacyContainer,
+                                                  prereg_listener);
      const hwaddr gpa = section->offset_within_address_space;
      hwaddr end;
      int ret;
@@ -141,7 +141,7 @@ const MemoryListener vfio_prereg_listener = {
      .region_del = vfio_prereg_listener_region_del,
  };
-int vfio_spapr_create_window(VFIOContainer *container,
+int vfio_spapr_create_window(VFIOLegacyContainer *container,
                               MemoryRegionSection *section,
                               hwaddr *pgsize)
  {
@@ -159,13 +159,13 @@ int vfio_spapr_create_window(VFIOContainer *container,
      if (pagesize > rampagesize) {
          pagesize = rampagesize;
      }
-    pgmask = container->pgsizes & (pagesize | (pagesize - 1));
+    pgmask = container->obj.pgsizes & (pagesize | (pagesize - 1));
      pagesize = pgmask ? (1ULL << (63 - clz64(pgmask))) : 0;
      if (!pagesize) {
          error_report("Host doesn't support page size 0x%"PRIx64
                       ", the supported mask is 0x%lx",
                       memory_region_iommu_get_min_page_size(iommu_mr),
-                     container->pgsizes);
+                     container->obj.pgsizes);
          return -EINVAL;
      }
@@ -233,7 +233,7 @@ int vfio_spapr_create_window(VFIOContainer *container,
      return 0;
  }
-int vfio_spapr_remove_window(VFIOContainer *container,
+int vfio_spapr_remove_window(VFIOLegacyContainer *container,
                               hwaddr offset_within_address_space)
  {
      struct vfio_iommu_spapr_tce_remove remove = {
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 03ff7944cb..02a6f36a9e 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -30,6 +30,7 @@
  #include <linux/vfio.h>
  #endif
  #include "sysemu/sysemu.h"
+#include "hw/vfio/vfio-container-obj.h"
#define VFIO_MSG_PREFIX "vfio %s: " @@ -70,58 +71,15 @@ typedef struct VFIOMigration {
      uint64_t pending_bytes;
  } VFIOMigration;
-typedef struct VFIOAddressSpace {
-    AddressSpace *as;
-    QLIST_HEAD(, VFIOContainer) containers;
-    QLIST_ENTRY(VFIOAddressSpace) list;
-} VFIOAddressSpace;
-
  struct VFIOGroup;
-typedef struct VFIOContainer {
-    VFIOAddressSpace *space;
+typedef struct VFIOLegacyContainer {
+    VFIOContainer obj;
      int fd; /* /dev/vfio/vfio, empowered by the attached groups */
-    MemoryListener listener;
      MemoryListener prereg_listener;
      unsigned iommu_type;
-    Error *error;
-    bool initialized;
-    bool dirty_pages_supported;
-    uint64_t dirty_pgsizes;
-    uint64_t max_dirty_bitmap_size;
-    unsigned long pgsizes;
-    unsigned int dma_max_mappings;
-    QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
-    QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
      QLIST_HEAD(, VFIOGroup) group_list;
-    QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
-    QLIST_ENTRY(VFIOContainer) next;
-} VFIOContainer;
-
-typedef struct VFIOGuestIOMMU {
-    VFIOContainer *container;
-    IOMMUMemoryRegion *iommu_mr;
-    hwaddr iommu_offset;
-    IOMMUNotifier n;
-    QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
-} VFIOGuestIOMMU;
-
-typedef struct VFIORamDiscardListener {
-    VFIOContainer *container;
-    MemoryRegion *mr;
-    hwaddr offset_within_address_space;
-    hwaddr size;
-    uint64_t granularity;
-    RamDiscardListener listener;
-    QLIST_ENTRY(VFIORamDiscardListener) next;
-} VFIORamDiscardListener;
-
-typedef struct VFIOHostDMAWindow {
-    hwaddr min_iova;
-    hwaddr max_iova;
-    uint64_t iova_pgsizes;
-    QLIST_ENTRY(VFIOHostDMAWindow) hostwin_next;
-} VFIOHostDMAWindow;
+} VFIOLegacyContainer;
typedef struct VFIODeviceOps VFIODeviceOps; @@ -159,7 +117,7 @@ struct VFIODeviceOps {
  typedef struct VFIOGroup {
      int fd;
      int groupid;
-    VFIOContainer *container;
+    VFIOLegacyContainer *container;
      QLIST_HEAD(, VFIODevice) device_list;
      QLIST_ENTRY(VFIOGroup) next;
      QLIST_ENTRY(VFIOGroup) container_next;
@@ -192,31 +150,13 @@ typedef struct VFIODisplay {
      } dmabuf;
  } VFIODisplay;
-void vfio_host_win_add(VFIOContainer *container,
+void vfio_host_win_add(VFIOContainer *bcontainer,
                         hwaddr min_iova, hwaddr max_iova,
                         uint64_t iova_pgsizes);
-int vfio_host_win_del(VFIOContainer *container, hwaddr min_iova,
+int vfio_host_win_del(VFIOContainer *bcontainer, hwaddr min_iova,
                        hwaddr max_iova);
  VFIOAddressSpace *vfio_get_address_space(AddressSpace *as);
  void vfio_put_address_space(VFIOAddressSpace *space);
-bool vfio_devices_all_running_and_saving(VFIOContainer *container);
-bool vfio_devices_all_dirty_tracking(VFIOContainer *container);
-
-/* container->fd */
-int vfio_dma_unmap(VFIOContainer *container,
-                   hwaddr iova, ram_addr_t size,
-                   IOMMUTLBEntry *iotlb);
-int vfio_dma_map(VFIOContainer *container, hwaddr iova,
-                 ram_addr_t size, void *vaddr, bool readonly);
-void vfio_set_dirty_page_tracking(VFIOContainer *container, bool start);
-int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
-                          uint64_t size, ram_addr_t ram_addr);
-
-int vfio_container_add_section_window(VFIOContainer *container,
-                                      MemoryRegionSection *section,
-                                      Error **errp);
-void vfio_container_del_section_window(VFIOContainer *container,
-                                       MemoryRegionSection *section);
void vfio_put_base_device(VFIODevice *vbasedev);
  void vfio_disable_irqindex(VFIODevice *vbasedev, int index);
@@ -263,10 +203,10 @@ vfio_get_device_info_cap(struct vfio_device_info *info, uint16_t id);
  #endif
  extern const MemoryListener vfio_prereg_listener;
-int vfio_spapr_create_window(VFIOContainer *container,
+int vfio_spapr_create_window(VFIOLegacyContainer *container,
                               MemoryRegionSection *section,
                               hwaddr *pgsize);
-int vfio_spapr_remove_window(VFIOContainer *container,
+int vfio_spapr_remove_window(VFIOLegacyContainer *container,
                               hwaddr offset_within_address_space);
int vfio_migration_probe(VFIODevice *vbasedev, Error **errp);
diff --git a/include/hw/vfio/vfio-container-obj.h b/include/hw/vfio/vfio-container-obj.h
new file mode 100644
index 0000000000..7ffbbb299f
--- /dev/null
+++ b/include/hw/vfio/vfio-container-obj.h
@@ -0,0 +1,154 @@
+/*
+ * VFIO CONTAINER BASE OBJECT
+ *
+ * Copyright (C) 2022 Intel Corporation.
+ * Copyright Red Hat, Inc. 2022
+ *
+ * Authors: Yi Liu <yi.l.liu@xxxxxxxxx>
+ *          Eric Auger <eric.auger@xxxxxxxxxx>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HW_VFIO_VFIO_CONTAINER_OBJ_H
+#define HW_VFIO_VFIO_CONTAINER_OBJ_H
+
+#include "qom/object.h"
+#include "exec/memory.h"
+#include "qemu/queue.h"
+#include "qemu/thread.h"
+#ifndef CONFIG_USER_ONLY
+#include "exec/hwaddr.h"
+#endif
+
+#define TYPE_VFIO_CONTAINER_OBJ "qemu:vfio-base-container-obj"
+#define VFIO_CONTAINER_OBJ(obj) \
+        OBJECT_CHECK(VFIOContainer, (obj), TYPE_VFIO_CONTAINER_OBJ)
+#define VFIO_CONTAINER_OBJ_CLASS(klass) \
+        OBJECT_CLASS_CHECK(VFIOContainerClass, (klass), \
+                         TYPE_VFIO_CONTAINER_OBJ)
+#define VFIO_CONTAINER_OBJ_GET_CLASS(obj) \
+        OBJECT_GET_CLASS(VFIOContainerClass, (obj), \
+                         TYPE_VFIO_CONTAINER_OBJ)
+
+typedef enum VFIOContainerFeature {
+    VFIO_FEAT_LIVE_MIGRATION,
+} VFIOContainerFeature;
+
+typedef struct VFIOContainer VFIOContainer;
+
+typedef struct VFIOAddressSpace {
+    AddressSpace *as;
+    QLIST_HEAD(, VFIOContainer) containers;
+    QLIST_ENTRY(VFIOAddressSpace) list;
+} VFIOAddressSpace;
+
+typedef struct VFIOGuestIOMMU {
+    VFIOContainer *container;
+    IOMMUMemoryRegion *iommu_mr;
+    hwaddr iommu_offset;
+    IOMMUNotifier n;
+    QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
+} VFIOGuestIOMMU;
+
+typedef struct VFIORamDiscardListener {
+    VFIOContainer *container;
+    MemoryRegion *mr;
+    hwaddr offset_within_address_space;
+    hwaddr size;
+    uint64_t granularity;
+    RamDiscardListener listener;
+    QLIST_ENTRY(VFIORamDiscardListener) next;
+} VFIORamDiscardListener;
+
+typedef struct VFIOHostDMAWindow {
+    hwaddr min_iova;
+    hwaddr max_iova;
+    uint64_t iova_pgsizes;
+    QLIST_ENTRY(VFIOHostDMAWindow) hostwin_next;
+} VFIOHostDMAWindow;
+
+/*
+ * This is the base object for vfio container backends
+ */
+struct VFIOContainer {
+    /* private */
+    Object parent_obj;
+
+    VFIOAddressSpace *space;
+    MemoryListener listener;
+    Error *error;
+    bool initialized;
+    bool dirty_pages_supported;
+    uint64_t dirty_pgsizes;
+    uint64_t max_dirty_bitmap_size;
+    unsigned long pgsizes;
+    unsigned int dma_max_mappings;
+    QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
+    QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
+    QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
+    QLIST_ENTRY(VFIOContainer) next;
+};
+
+typedef struct VFIOContainerClass {
+    /* private */
+    ObjectClass parent_class;
+
+    /* required */
+    bool (*check_extension)(VFIOContainer *container,
+                            VFIOContainerFeature feat);
+    int (*dma_map)(VFIOContainer *container,
+                   hwaddr iova, ram_addr_t size,
+                   void *vaddr, bool readonly);
+    int (*dma_unmap)(VFIOContainer *container,
+                     hwaddr iova, ram_addr_t size,
+                     IOMMUTLBEntry *iotlb);
+    /* migration feature */
+    bool (*devices_all_dirty_tracking)(VFIOContainer *container);
+    void (*set_dirty_page_tracking)(VFIOContainer *container, bool start);
+    int (*get_dirty_bitmap)(VFIOContainer *container, uint64_t iova,
+                            uint64_t size, ram_addr_t ram_addr);
+
+    /* SPAPR specific */
+    int (*add_window)(VFIOContainer *container,
+                      MemoryRegionSection *section,
+                      Error **errp);
+    void (*del_window)(VFIOContainer *container,
+                       MemoryRegionSection *section);
+} VFIOContainerClass;
+
+bool vfio_container_check_extension(VFIOContainer *container,
+                                    VFIOContainerFeature feat);
+int vfio_container_dma_map(VFIOContainer *container,
+                           hwaddr iova, ram_addr_t size,
+                           void *vaddr, bool readonly);
+int vfio_container_dma_unmap(VFIOContainer *container,
+                             hwaddr iova, ram_addr_t size,
+                             IOMMUTLBEntry *iotlb);
+bool vfio_container_devices_all_dirty_tracking(VFIOContainer *container);
+void vfio_container_set_dirty_page_tracking(VFIOContainer *container,
+                                            bool start);
+int vfio_container_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
+                                    uint64_t size, ram_addr_t ram_addr);
+int vfio_container_add_section_window(VFIOContainer *container,
+                                      MemoryRegionSection *section,
+                                      Error **errp);
+void vfio_container_del_section_window(VFIOContainer *container,
+                                       MemoryRegionSection *section);
+
+void vfio_container_init(void *_container, size_t instance_size,
+                         const char *mrtypename,
+                         VFIOAddressSpace *space);
+void vfio_container_destroy(VFIOContainer *container);
+#endif /* HW_VFIO_VFIO_CONTAINER_OBJ_H */


--
Regards,
Yi Liu



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux