On 9/11/2023 11:23 PM, Jason Wang wrote:
On Tue, Sep 12, 2023 at 7:31 AM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:
Hi Jason,
On 9/10/2023 8:42 PM, Jason Wang wrote:
Hi Si-Wei:
On Sat, Sep 9, 2023 at 9:34 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:
On-chip IOMMU parent driver could use it to restore memory mapping
to the initial state.
As discussed before. On-chip IOMMU is the hardware details that need
to be hidden by the vDPA bus.
I guess today this is exposed to the bus driver layer already, for e.g.
vhost_vdpa_map() can call into the .dma_map, or .set_map, or
iommu_map() flavors depending on the specific hardware IOMMU
implementation underneath? Specifically, "struct iommu_domain *domain"
is now part of "struct vhost_vdpa" at an individual bus driver
(vhost-vdpa), rather than being wrapped around under the vdpa core
"struct vdpa_device" as vdpa device level object. Do we know for what
reason the hardware details could be exposed to bus callers like
vhost_vdpa_map and vhost_vdpa_general_unmap, while it's prohibited for
other similar cases on the other hand? Or is there a boundary in between
I was not aware of?
Let me try to explain:
set_map(), dma_map(), dma_unmap() is used for parent specific
mappings. It means the parents want to do vendor specific setup for
the mapping. The abstraction of translation is still one dimension
(thought the actual implementation in the parent could be two
dimensions). So it's not necessarily the on-chip stuff (see the
example of the VDUSE).
That means we never expose two dimension mappings like (on-chip)
beyond the bus. So it's not one dimension vs two dimensions but the
platform specific mappings vs vendor specific mappings.
OK, I think I saw on-chip was used interchangeably for vendor specific
means of mapping even for VDUSE. While I think we both agreed it's too
complex to expose the details of two-dimensions and we should try to
avoid that (I thought on-chip doesn't imply two-dimension but just the
vendor specific part). That's the reason why I hide this special detail
under a simple .reset_map interface such that we could easily decouple
mapping from virtio life cycle (device reset).
I think a more fundamental question I don't quite understand, is adding
an extra API to on-chip IOMMU itself an issue, or just that you don't
like the way how the IOMMU model gets exposed via this specific API of
.reset_map?
extra API to on-chip IOMMU, since the on-chip logics should be hidden
by the bus unless we want to introduce the two dimensions abstraction
(which seems to be an overkill).
Thanks for clarifications of your concern. I will rephrase on-chip to
"vendor specific" and try to avoid mentioning the two-dimension aspect
of the API.
For the platform IOMMU case, internally there exists
distinction between the 1:1 identify (passthrough) mode and DMA page
mapping mode, and this distinction is somehow getting exposed and
propagated through the IOMMU API - for e.g. iommu_domain_alloc() and
iommu_attach_device() are being called explicitly from
vhost_vdpa_alloc_domain() by vhost-vdpa (and the opposite from within
vhost_vdpa_free_domain), while for virtio-vdpa it doesn't call any IOMMU
API at all on the other hand
It's the way the kernel manages DMA mappings. For a userspace driver
via vhost-vDPA, it needs to call IOMMU APIs. And for a kernel driver
via virtio-vDPA, DMA API is used (via the dma_dev exposed through
virtio_vdpa). DMA API may decide to call IOMMU API if IOMMU is enabled
but not in passthrough mode.
Right, I think what I meant is, distinction of mapping requirement
exists between two bus drivers, vhost-vdpa and virtio-vdpa. It's
impossible to hide every details (identity, swiotlb, dmar) under the
cover of DMA API simply using the IOMMU API abstraction. Same applies to
how one dimension oriented vendor specific API ( .dma_map/.set_map I
mean) can't cover all cases of potentially multi-dimensional mapping
requirements from virtio-vdpa (which is using a feature rich DMA API
instead of simple and lower level page mapping based IOMMU API). I now
get that you may want to understand why .reset_map is required and which
part of the userspace functionality won't work without it, on the other
hand.
- which is to inherit what default IOMMU
domain has.
Yes, but it's not a 1:1 (identify) mapping, it really depends on the
configuration. (And there could even be a swiotlb layer in the
middle).
Yes, so I said inherit the configuration of the default domain, which
could vary versus one-dimension.
Ideally for on-chip IOMMU we can and should do pretty much
the same, but I don't think there's a clean way without introducing any
driver API to make vhost-vdpa case distinguish from the virtio-vdpa
case. I'm afraid to say that it was just a hack to hide the necessary
distinction needed by vdpa bus users for e.g. in the deep of
vdpa_reset(), if not introducing any new driver API is the goal here...
So rest_map() is fine if it is not defined just for on-chip. For
example, does VDUSE need to implement it or not?
If "on-chip" of what you said means "two-dimension" or "identity
mapping" in the context I would say it's definitely not the intent.
Instead, it's the best I can think of what is able to not expose that
part of the specifics.
VDUSE can implement it if it has similar requirement of resetting
mapping to the default state for bus driver users like vhost-vdpa. This
is left up to the VDUSE owner to decide, though from what I collect so
far it doesn't seem have to do so for the moment, as it's explicitly
using DMA ops to implement swiotlb like bouncing mechanism which works
more closely to the DMA API. And its vhost-vdpa usage seems not page
pinning based but through shared memory? Losing the performance reason
to decouple mapping.
Exposing this will complicate the implementation of bus drivers.
As said above, this distinction is needed by bus drivers, and it's
already done by platform IOMMU via IOMMU API. I can drop the .reset_map
API while add another set of similar driver API to mimic
iommu_domain_alloc/iommu_domain_free, but doing this will complicate the
parent driver's implementation on the other hand.
I'm not sure I understand the issue. But something like PD
allocation/free in RDMA?
Never mind, this would just expose more implementation details on the
two-dimension (or maybe multi-dimensional) mapping model and also
introduces more complexity. Certainly not something I'd advocate for.
While .reset_map is
what I can think of to be the simplest for parent, I can do the other
way if you're fine with it. Let me know how it sounds.
I think what I still don't understand is: how is reset_map() related
to persistent IOTLB? I guess it's a must but I still didn't figure out
why.
Hope my follow-up response to patch 2 and 4 got it clarified? If not let
me know which part I may be missing.
Thanks,
-Siwei
Thanks
Thanks,
-Siwei
Thanks
Signed-off-by: Si-Wei Liu <si-wei.liu@xxxxxxxxxx>
---
include/linux/vdpa.h | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 17a4efa..daecf55 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -324,6 +324,12 @@ struct vdpa_map_file {
* @iova: iova to be unmapped
* @size: size of the area
* Returns integer: success (0) or error (< 0)
+ * @reset_map: Reset device memory mapping (optional)
+ * Needed for device that using device
+ * specific DMA translation (on-chip IOMMU)
+ * @vdev: vdpa device
+ * @asid: address space identifier
+ * Returns integer: success (0) or error (< 0)
* @get_vq_dma_dev: Get the dma device for a specific
* virtqueue (optional)
* @vdev: vdpa device
@@ -401,6 +407,7 @@ struct vdpa_config_ops {
u64 iova, u64 size, u64 pa, u32 perm, void *opaque);
int (*dma_unmap)(struct vdpa_device *vdev, unsigned int asid,
u64 iova, u64 size);
+ int (*reset_map)(struct vdpa_device *vdev, unsigned int asid);
int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group,
unsigned int asid);
struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx);
--
1.8.3.1
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization