在 2021/7/14 下午5:14, Eugenio Perez Martin 写道:
On Wed, Jul 14, 2021 at 8:54 AM Eugenio Perez Martin <eperezma@xxxxxxxxxx> wrote:On Wed, Jul 14, 2021 at 5:04 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:在 2021/6/1 下午4:15, Eugenio Perez Martin 写道:On Mon, May 31, 2021 at 11:40 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:在 2021/5/20 上午12:28, Eugenio Pérez 写道:This tree is able to look for a translated address from a IOVA address. At first glance is similar to util/iova-tree. However, SVQ working on devices with limited IOVA space need more capabilities, like allocating IOVA chunks or perform reverse translations (qemu addresses to iova). Starting a sepparated implementation. Knowing than insertions/deletions will not be as frequent as searches,This might not be true if vIOMMU is enabled.Right.it uses an ordered array at implementation.I wonder how much overhead could g_array be if it needs to grow.I didn't do any tests, actually. But I see this VhostIOVATree as a replaceable tool, just to get the buffer translations to work. So I'm both ok with change it now and ok to delay it, since they should not be hard to do.A different name could be used, but ordered searchable array is a little bit long though.Note that we had a very good example for this. That is the kernel iova allocator which is implemented via rbtree. Instead of figuring out g_array vs g_tree stuffs, I would simple go with g_tree first (based on util/iova-tree) and borrow the well design kernel iova allocator API to have a generic IOVA one instead of coupling it with vhost. It could be used by other userspace driver in the future: init_iova_domain()/put_iova_domain(); alloc_iova()/free_iova(); find_iova();We could go that way, but then the iova-tree should be extended to support both translations (iova->translated_addr is now implemented in iova-tree, the reverse is not). If I understood you correctly, borrowing the kernel iova allocator would give us both, right?No the reverse lookup is done via a specific IOMMU driver if I understand it correctly. And if the mapping is 1:1 we can just use two iova-tree I guess.I did try with two IOVATree, but the usage of the reversed one is confusing at best. To reuse most of the code, .iova needs to be .translated_addr, and the opposite. Maybe I can try again using a wrapper structure that reverses them on each operation (insert, search, ...). Thanks!Another feature is also needed that IOVATree does not support: Allocation, as the searching of a free hole in iova range [1]. Linux's alloc_iova is actually the equivalent of vhost_iova_tree_insert in this commit, that expects iova addresses to be specified, not "allocated".
Yes, that's why need a general iova allocator. (But use it for shadow virtqueue first for speeding up the merge).
My first attempt to solve that was to add a second (third?) tree with the free space. But this again complicates the code a lot, and misses a few optimization opportunities (like the need of searching in many trees instead of just one, and then reuse the result [2]. Maybe iova_tree_foreach can be modified to achieve this, but then more code needs to be modified.
Ok, I think you can propose what you think better and let's see the code then.
[1] Range themselves are not supported in IOVATree natively, but I think they are more or less straightforward to implement in vhost-vdpa.
You mean in the kernel? We only have the abstraction of entries which is kind of but more than just a simple range.
Thanks
Note that it is not coupled to vhost at all except in the name: all the implementations only work with hwaddr and void pointers memory. Just to illustrate the point, I think it could be a drop-in replacement for iova-tree at this moment (with all the drawbacks/advantages of an array vs tree).Ok. ThanksAnother reference is the iova allocator that is implemented in VFIO.I will check this too. Thanks!ThanksSigned-off-by: Eugenio Pérez <eperezma@xxxxxxxxxx> --- hw/virtio/vhost-iova-tree.h | 50 ++++++++++ hw/virtio/vhost-iova-tree.c | 188 ++++++++++++++++++++++++++++++++++++ hw/virtio/meson.build | 2 +- 3 files changed, 239 insertions(+), 1 deletion(-) create mode 100644 hw/virtio/vhost-iova-tree.h create mode 100644 hw/virtio/vhost-iova-tree.c diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h new file mode 100644 index 0000000000..2a44af8b3a --- /dev/null +++ b/hw/virtio/vhost-iova-tree.h @@ -0,0 +1,50 @@ +/* + * vhost software live migration ring + * + * SPDX-FileCopyrightText: Red Hat, Inc. 2021 + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@xxxxxxxxxx> + * + * SPDX-License-Identifier: GPL-2.0-or-later + */ + +#ifndef HW_VIRTIO_VHOST_IOVA_TREE_H +#define HW_VIRTIO_VHOST_IOVA_TREE_H + +#include <gmodule.h> + +#include "exec/memory.h" + +typedef struct VhostDMAMap { + void *translated_addr; + hwaddr iova; + hwaddr size; /* Inclusive */ + IOMMUAccessFlags perm; +} VhostDMAMap; + +typedef enum VhostDMAMapNewRC { + VHOST_DMA_MAP_OVERLAP = -2, + VHOST_DMA_MAP_INVALID = -1, + VHOST_DMA_MAP_OK = 0, +} VhostDMAMapNewRC; + +/** + * VhostIOVATree + * + * Store and search IOVA -> Translated mappings. + * + * Note that it cannot remove nodes. + */ +typedef struct VhostIOVATree { + /* Ordered array of reverse translations, IOVA address to qemu memory. */ + GArray *iova_taddr_map; +} VhostIOVATree; + +void vhost_iova_tree_new(VhostIOVATree *iova_rm); +void vhost_iova_tree_destroy(VhostIOVATree *iova_rm); + +const VhostDMAMap *vhost_iova_tree_find_taddr(const VhostIOVATree *iova_rm, + const VhostDMAMap *map); +VhostDMAMapNewRC vhost_iova_tree_insert(VhostIOVATree *iova_rm, + VhostDMAMap *map); + +#endif diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c new file mode 100644 index 0000000000..dfd7e448b5 --- /dev/null +++ b/hw/virtio/vhost-iova-tree.c @@ -0,0 +1,188 @@ +/* + * vhost software live migration ring + * + * SPDX-FileCopyrightText: Red Hat, Inc. 2021 + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@xxxxxxxxxx> + * + * SPDX-License-Identifier: GPL-2.0-or-later + */ + +#include "qemu/osdep.h" +#include "vhost-iova-tree.h" + +#define G_ARRAY_NOT_ZERO_TERMINATED false +#define G_ARRAY_NOT_CLEAR_ON_ALLOC false + +/** + * Inserts an element after an existing one in garray. + * + * @array The array + * @prev_elem The previous element of array of NULL if prepending[2] If I remember correctly.+ * @map The DMA map + * + * It provides the aditional advantage of being type safe over + * g_array_insert_val, which accepts a reference pointer instead of a value + * with no complains. + */ +static void vhost_iova_tree_insert_after(GArray *array, + const VhostDMAMap *prev_elem, + const VhostDMAMap *map) +{ + size_t pos; + + if (!prev_elem) { + pos = 0; + } else { + pos = prev_elem - &g_array_index(array, typeof(*prev_elem), 0) + 1; + } + + g_array_insert_val(array, pos, *map); +} + +static gint vhost_iova_tree_cmp_iova(gconstpointer a, gconstpointer b) +{ + const VhostDMAMap *m1 = a, *m2 = b; + + if (m1->iova > m2->iova + m2->size) { + return 1; + } + + if (m1->iova + m1->size < m2->iova) { + return -1; + } + + /* Overlapped */ + return 0; +} + +/** + * Find the previous node to a given iova + * + * @array The ascending ordered-by-translated-addr array of VhostDMAMap + * @map The map to insert + * @prev Returned location of the previous map + * + * Return VHOST_DMA_MAP_OK if everything went well, or VHOST_DMA_MAP_OVERLAP if + * it already exists. It is ok to use this function to check if a given range + * exists, but it will use a linear search. + * + * TODO: We can use bsearch to locate the entry if we save the state in the + * needle, knowing that the needle is always the first argument to + * compare_func. + */ +static VhostDMAMapNewRC vhost_iova_tree_find_prev(const GArray *array, + GCompareFunc compare_func, + const VhostDMAMap *map, + const VhostDMAMap **prev) +{ + size_t i; + int r; + + *prev = NULL; + for (i = 0; i < array->len; ++i) { + r = compare_func(map, &g_array_index(array, typeof(*map), i)); + if (r == 0) { + return VHOST_DMA_MAP_OVERLAP; + } + if (r < 0) { + return VHOST_DMA_MAP_OK; + } + + *prev = &g_array_index(array, typeof(**prev), i); + } + + return VHOST_DMA_MAP_OK; +} + +/** + * Create a new IOVA tree + * + * @tree The IOVA tree + */ +void vhost_iova_tree_new(VhostIOVATree *tree) +{ + assert(tree); + + tree->iova_taddr_map = g_array_new(G_ARRAY_NOT_ZERO_TERMINATED, + G_ARRAY_NOT_CLEAR_ON_ALLOC, + sizeof(VhostDMAMap)); +} + +/** + * Destroy an IOVA tree + * + * @tree The iova tree + */ +void vhost_iova_tree_destroy(VhostIOVATree *tree) +{ + g_array_unref(g_steal_pointer(&tree->iova_taddr_map)); +} + +/** + * Perform a search on a GArray. + * + * @array Glib array + * @map Map to look up + * @compare_func Compare function to use + * + * Return The found element or NULL if not found. + * + * This can be replaced with g_array_binary_search (Since glib 2.62) when that + * is common enough. + */ +static const VhostDMAMap *vhost_iova_tree_bsearch(const GArray *array, + const VhostDMAMap *map, + GCompareFunc compare_func) +{ + return bsearch(map, array->data, array->len, sizeof(*map), compare_func); +} + +/** + * Find the translated address stored from a IOVA address + * + * @tree The iova tree + * @map The map with the memory address + * + * Return the stored mapping, or NULL if not found. + */ +const VhostDMAMap *vhost_iova_tree_find_taddr(const VhostIOVATree *tree, + const VhostDMAMap *map) +{ + return vhost_iova_tree_bsearch(tree->iova_taddr_map, map, + vhost_iova_tree_cmp_iova); +} + +/** + * Insert a new map + * + * @tree The iova tree + * @map The iova map + * + * Returns: + * - VHOST_DMA_MAP_OK if the map fits in the container + * - VHOST_DMA_MAP_INVALID if the map does not make sense (like size overflow) + * - VHOST_DMA_MAP_OVERLAP if the tree already contains that map + * Can query the assignated iova in map. + */ +VhostDMAMapNewRC vhost_iova_tree_insert(VhostIOVATree *tree, + VhostDMAMap *map) +{ + const VhostDMAMap *prev; + int find_prev_rc; + + if (map->translated_addr + map->size < map->translated_addr || + map->iova + map->size < map->iova || map->perm == IOMMU_NONE) { + return VHOST_DMA_MAP_INVALID; + } + + /* Check for duplicates, and save position for insertion */ + find_prev_rc = vhost_iova_tree_find_prev(tree->iova_taddr_map, + vhost_iova_tree_cmp_iova, map, + &prev); + if (find_prev_rc == VHOST_DMA_MAP_OVERLAP) { + return VHOST_DMA_MAP_OVERLAP; + } + + vhost_iova_tree_insert_after(tree->iova_taddr_map, prev, map); + return VHOST_DMA_MAP_OK; +} diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build index 8b5a0225fe..cb306b83c6 100644 --- a/hw/virtio/meson.build +++ b/hw/virtio/meson.build @@ -11,7 +11,7 @@ softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c')) virtio_ss = ss.source_set() virtio_ss.add(files('virtio.c')) -virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c')) +virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c', 'vhost-iova-tree.c')) virtio_ss.add(when: 'CONFIG_VHOST_USER', if_true: files('vhost-user.c')) virtio_ss.add(when: 'CONFIG_VHOST_VDPA', if_true: files('vhost-vdpa.c')) virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))
_______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization