RE: [iommu] [PATCH v4 01/16] docs: Document IO Address Space ID (IOASID) APIs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: Jacob Pan
> Sent: Friday, February 19, 2021 5:21 AM
> 
> IOASID is used to identify address spaces that can be targeted by device
> DMA. It is a system-wide resource that is essential to its many users.
> This document is an attempt to help developers from all vendors navigate
> the APIs. At this time, ARM SMMU and Intel’s Scalable IO Virtualization

Intel VT-d? SIOV represents just one usage of IOASID (and actually not the
main target in this context)

> (SIOV) enabled platforms are the primary users of IOASID. Examples of
> how SIOV components interact with the IOASID APIs are provided.

ditto. Here we just use VT-d as example.

> 
> Cc: Jonathan Corbet <corbet@xxxxxxx>
> Cc: linux-doc@xxxxxxxxxxxxxxx
> Cc: Randy Dunlap <rdunlap@xxxxxxxxxxxxx>
> Signed-off-by: Liu Yi L <yi.l.liu@xxxxxxxxx>
> Signed-off-by: Wu Hao <hao.wu@xxxxxxxxx>
> Signed-off-by: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx>
> ---
>  Documentation/driver-api/ioasid.rst | 696 ++++++++++++++++++++++++++++
>  1 file changed, 696 insertions(+)
>  create mode 100644 Documentation/driver-api/ioasid.rst
> 
> diff --git a/Documentation/driver-api/ioasid.rst b/Documentation/driver-
> api/ioasid.rst
> new file mode 100644
> index 000000000000..3dc337eb4471
> --- /dev/null
> +++ b/Documentation/driver-api/ioasid.rst
> @@ -0,0 +1,696 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. ioasid:
> +
> +=====================
> + IO Address Space ID
> +=====================
> +
> +IOASIDs are used to identify virtual address spaces that DMA requests can
> +target. It is a generic name for PCIe Process Address ID (PASID) or
> +SubstreamID defined by ARM's SMMU.
> +
> +The primary use cases for IOASIDs are Shared Virtual Address (SVA) and
> +IO Virtual Address (IOVA) when multiple address spaces per device are
> +desired. Due to hardware architectural differences the requirements for
> +IOASID management can vary in terms of namespace, state management,
> and
> +virtualization usages.
> +
> +The IOASID subsystem consists of three components:
> +
> +- IOASID core: provides APIs for allocation, pool management,
> +  notifications and refcounting. See Documentation/driver-api/ioasid.rst

this file?

> +  for details
> +- IOASID user:  provides user allocation interface via /dev/ioasid

mark it as  TODO

> +- IOASID cgroup controller: manage resource distribution

this needs a link.

> +
> +This document covers the features supported by the IOASID core APIs.
> +Vendor-specific use cases are also illustrated with Intel's VT-d
> +based platforms as the first example. The term PASID and IOASID are used
> +interchangeablly throughout this document.

interchangeably

> +
> +.. contents:: :local:
> +
> +Glossary
> +========
> +PASID - Process Address Space ID
> +
> +IOVA - IO Virtual Address
> +
> +IOASID - IO Address Space ID (generic term for PCIe PASID and
> +SubstreamID in SMMU)
> +
> +SVA/SVM - Shared Virtual Addressing/Memory
> +
> +gSVA - Guest Shared Virtual Addressing, nested translation is used
> +
> +gIOVA - Guest IO Virtual Addressing, nested translation is used

nested translation is not mandatory. actually redundant info from
glossary p.o.v

> +
> +ENQCMD - Instruction to submit work to shared workqueues. Refer
> +to "Intel X86 ISA for efficient workqueue submission" [1]
> +
> +DSA - Intel Data Streaming Accelerator [2]
> +
> +VDCM - Virtual Device Composition Module [3]
> +
> +SIOV - Intel Scalable IO Virtualization
> +
> +DWQ - Dedicated Work Queue
> +
> +SWQ - Shared Work Queue
> +
> +1.
> https://software.intel.com/sites/default/files/managed/c5/15/architecture-
> instruction-set-extensions-programming-reference.pdf
> +
> +2. https://01.org/blogs/2019/introducing-intel-data-streaming-accelerator
> +
> +3. https://software.intel.com/en-us/download/intel-data-streaming-
> accelerator-preliminary-architecture-specification
> +
> +
> +Key Concepts
> +============
> +
> +IOASID Set
> +----------
> +An IOASID set is a group of IOASIDs allocated from the system-wide
> +IOASID pool. Refer to section "IOASID Set Level APIs" for more details.
> +
> +IOASID set is particularly useful for guest SVA where each guest could
> +have its own IOASID set for security and efficiency reasons.
> +
> +Guest IOASID
> +------------------
> +IOASID used by the guest, identifies a guest IOVA space or a guest VA
> +space per guest process.
> +
> +Host IOASID
> +-----------------
> +IOASID used by the host either for bare metal SVA or as the backing of a
> +guest IOASID.
> +
> +
> +IOASID Set Private ID (SPID)
> +----------------------------
> +Each IOASID set has a private namespace of SPIDs. An SPID maps to a
> +single system-wide IOASID. Conversely, each IOASID may be associated
> +with an alias ID, local to the IOASID set, named SPID.
> +SPIDs can be used as guest IOASIDs where each guest could do
> +IOASID allocation from its own pool/set and map them to host physical
> +IOASIDs. SPIDs are particularly useful for supporting live migration
> +where decoupling guest and host physical resources are necessary.

add one sentence to explain why the kernel needs to store such 
information so the audience could catch the basic intention.

> +
> +For example, two VMs can both allocate guest PASID/SPID #101 but map to
> +different host PASIDs #201 and #202 respectively as shown in the
> +diagram below.
> +::
> +
> + .------------------.    .------------------.
> + |   VM 1           |    |   VM 2           |
> + |                  |    |                  |
> + |------------------|    |------------------|
> + | GPASID/SPID 101  |    | GPASID/SPID 101  |
> + '------------------'    -------------------'     Guest
> + __________|______________________|____________________
> +           |                      |               Host
> +           v                      v
> + .------------------.    .------------------.
> + | Host IOASID 201  |    | Host IOASID 202  |
> + '------------------'    '------------------'
> + |   IOASID set 1   |    |   IOASID set 2   |
> + '------------------'    '------------------'
> +
> +Guest PASID is treated as IOASID set private ID (SPID) within an
> +IOASID set, mappings between guest and host IOASIDs are stored in the
> +set for inquiry.

the example could be in a separate sub-section, as a summary to
connect all the conceptual blocks together.

> +
> +Theory of Operation
> +===================
> +
> +States
> +------
> +IOASID has four states as illustrated in the diagram below.
> +::
> +
> +   BIND/UNBIND, WQ PROG/CLEAR
> +   -----------------------------.
> +                                |
> +   ALLOC                        |
> +   ------------.                |
> +               |                |
> +   +-------+   v    +-------+   v     +----------+
> +   | FREE  |=======>| IDLE¹ |========>| ACTIVE²  |
> +   +-------+        +-------+         +----------+
> +      ^                                    |
> +      |           +---------------+        |
> +      '===========| FREE PENDING³ |<======='
> +                  +---------------+  ^
> +   FREE                              |
> +   ----------------------------------'
> +   ¹ Allocated but not used
> +   ² Used by device drivers or CPU, each user holds a reference

what about IOMMU?

> +   ³ Waiting for all users drop their refcount before returning IOASID
> +   back to the pool
> +

No background of BIND/UNBIND/ALLOC/FREE/WQ PROG/FREE, etc...

what about IDLE->FREE and ACTIVE->IDLE? Is it clearer to just describe
each state in text (and which action may lead to it)?

> +
> +Notifications
> +-------------
> +Depending on the hardware architecture, an IOASID can be programmed
> into
> +CPU, IOMMU, or devices for DMA related activity. The synchronization
> among them
> +is based on events notifications which follows a publisher-subscriber
> pattern.
> +
> +Events
> +~~~~~~
> +Notification events are pertinent to individual IOASIDs, they can be
> +one of the following::
> +
> + - ALLOC
> + - FREE
> + - BIND
> + - UNBIND

again, no explanation of when those events will be triggered?

> +
> +Ordering
> +~~~~~~~~
> +Ordering of notification events is supported by the IOASID core as the
> +following (from high to low)::
> +
> + - CPU
> + - IOMMU
> + - DEVICE
> +
> +Subscribers of IOASID events are responsible for registering their
> +notification blocks according to the priorities.
> +
> +The above order applies to all events. For examine, the UNBIND event is
> +issued when a guest IOASID is freed due to exceptions. All active DMA

what exceptions? why cannot UNBIND event be triggered in normal path,
e.g. when the guest requests to unbind a page table?

> +sources should be quiesced before tearing down other hardware contexts

only for the said IOASID

> +in the system. This is necessary to reduce the churn in handling faults.
> +The notification order ensures that vCPU is stopped before IOMMU and
> +devices.

vCPU is never stopped in the whole flow. It's just about tearing down
IOASID related info in the CPU.

and why do we need to enforce such order? This needs some text...

> +Besides calling ioasid_notify directly, notifications can also be sent

besides 'who' calling ioasid_notify?

> +by the IOASID core as a by-product of calling the following APIs::
> +
> + - ioasisd_free()        /* emits IOASID_FREE */
> + - ioasid_detach_spid()  /* emits IOASID_UNBIND */
> + - ioasid_attach_spid()  /* emits IOASID_BIND */
> +
> +It is the callers responsibility to avoid chained notifications in the

callers -> caller's. 

and why is chained notification a problem? is it clearer to consolidate
this part with atomicity section?

> +atomic notification handlers. i.e. ioasid_detach_spid() cannot be called
> +inside the IOASID_FREE atomic handlers. However, ioasid_detach_spid()
> can
> +be called from deferred work. See Atomicity section for details.
> +
> +Level Sensitivity
> +~~~~~~~~~~~~~~~~~
> +For each IOASID state transition, IOASID core ensures that there is
> +only one notification sent. This resembles level triggered interrupt
> +where a single interrupt is raised during a state transition.
> +For example, if ioasid_free() is called twice by a user before the
> +IOASID is reclaimed, IOASID core will only send out a single
> +IOASID_NOTIFY_FREE event. Similarly, for IOASID_NOTIFY_BIND/UNBIND
> +events, which is only sent out once when a SPID is attached/detached.
> +
> +Scopes
> +~~~~~~
> +There are two types of notifiers in IOASID core: system-wide and
> +ioasid_set-wide.

ioasid_set-wide -> "per ioasid_set"

> +
> +System-wide notifier is catering for users that need to handle all the
> +IOASIDs in the system. E.g. The IOMMU driver.
> +
> +Per ioasid_set notifier can be used by VM specific components such as
> +KVM. After all, each KVM instance only cares about IOASIDs within its
> +own set/guest.

how to specify the scope? need the API information here.

> +
> +Atomicity
> +~~~~~~~~~
> +IOASID notifiers are atomic due to spinlocks used inside the IOASID
> +core. For tasks that cannot be completed in the notifier handler,
> +async work can be submitted to the ordered workqueue provided by the

"can be" or "must be"?

> +IOASID core. This will ensure ordered completion of the work items
> +submitted by all users.
> +
> +Reference counting
> +------------------
> +IOASID life cycle management is based on reference counting. Users of
> +IOASID who intend to align its context with the life cycle need to hold
> +references of the IOASID. An IOASID will not be returned to the pool
> +for re-allocation until all its references are dropped. Calling ioasid_free()
> +will mark the IOASID as FREE_PENDING if the IOASID has outstanding
> +references. No new references can be taken by ioasid_get() once an
> +IOASID is in the FREE_PENDING state. ioasid_free() can be called
> +multiple times without an error until all refs are dropped.
> +
> +ioasid_put() decrements and tests refcount of the IOASID. If refcount
> +is 0, ioasid will be freed. The IOASID will be returned to the pool and
> +available for new allocations. Note that ioasid_put() can be called by
> +IOASID_FREE event handler where the IOASID is reclaimed.

unclear about the last sentence

> +
> +Event notifications are used to inform users of IOASID status change.
> +IOASID_FREE or UNBIND events prompt users to drop their references after
> +clearing its context.
> +
> +For example, on VT-d platform when an IOASID is freed, teardown
> +actions are performed on CPU (KVM), device driver (VDCM), and the
> IOMMU
> +driver. To quiesce vCPU for work submission, KVM notifier handler must
> +be called before VDCM handler. Therefore, KVM and VDCM shall monitor

It's difficult to understand why KVM needs listen to such events w/o 
background about ENQCMD and VMCS PASID translation table.

> +notification events IOASID_UNBIND. As KVM x86 code registers notification
> +block with priority IOASID_PRIO_CPU and VDCM code registers notification
> +block with priority IOASID_PRIO_DEVICE, IOASID core ensures the CPU
> +handlers are called before the DEVICE handlers.

this sounds like a order thing, not about refcnt.

> +
> +For both KVM and VDCM, notifier blocks shall be registered on the
> +IOASID set such that *only* events from the matching VM are received.
> +
> +If KVM attempts to register a notifier block before the IOASID set is
> +created using the MM token, the notifier block will be placed on a

why does MM token matter here? no background again...

and I have to stop the review here. It looks that you have many tricky
designs to be explained, but didn't organize them in a clean way and
lack of many backgrounds for others to even understand the basic
picture. Possibly starting from a simpler but clearer version for the
basic working flow and then expand it with additional caveats is 
a more reasonable way to go...

Thanks
Kevin




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux