After the virtio-iommu device has been probed and the driver is aware of the devices translated by the IOMMU, it can start sending requests to the virtio-iommu device. The operations described here are voluntarily minimalistic, so vIOMMU devices can be as simple as possible to implement, and can be extended with feature bits. I. Overview II. Feature bits III. Device configuration layout IV. Device initialization V. Device operations 1. Attach device 2. Detach device 3. Map region 4. Unmap region I. Overview =========== Requests are small buffers added by the guest to the request virtqueue. The guest can add a batch of them to the queue and send a notification (kick) to the device to have all of them handled. Here is an example flow: * attach(address space, device), kick: create a new address space and attach a device to it * map(address space, virt, phys, size, flags): create a mapping between a guest-virtual and a guest-physical addresses * map, map, map, kick * ... here the guest device can perform DMA to the freshly mapped memory * unmap(address space, virt, size), unmap, kick * detach(address space, device), kick The following description attempts to use the same format as other virtio devices. We won't go into details of the virtio transport, please refer to [VIRTIO-v1.0] for more information. As a quick reminder, the virtio (1.0) transport can be described with the following flow: HOST : GUEST (3) : .----- [available ring] <-----. (2) / : \ v (4) : (1) \ [device] <--- [descriptor table] <---- [driver] \ : ^ \ : / (5) '-------> [used ring] ---------' : (6) : (1) Driver has a buffers with a payload to send via virtio. It writes address and size of buffer in a descriptor. It can chain N sub-buffers by writing N descriptors and linking them together. The first descriptor of the chain is referred to as the head. (2) Driver queues the head index into the 'available' ring. (3) Driver notifies the device. Since virtio-iommu uses MMIO, notification is done by writing to a doorbell address. KVM traps it and forwards the notification to the virtio device. Device dequeues the head index from the 'available' ring. (4) Device reads all descriptors in the chain, handles the payload. (5) Device writes the head index into the 'used' ring and sends a notification to the guest, by injecting an interrupt. (6) Driver pops the head from the used ring, and optionally read the buffers that were updated by the device. II. Feature bits ================ VIRTIO_IOMMU_F_INPUT_RANGE (0) Available range of virtual addresses is described in input_range VIRTIO_IOMMU_F_IOASID_BITS (1) The number of address spaces supported is described in ioasid_bits VIRTIO_IOMMU_F_MAP_UNMAP (2) Map and unmap requests are available. This is here to allow a device or driver to only implement page-table sharing, once we introduce the feature. Device will be able to only select one of F_MAP_UNMAP or F_PT_SHARING. For the moment, this bit must always be set. VIRTIO_IOMMU_F_BYPASS (3) When not attached to an address space, devices behind the IOMMU can access the physical address space. III. Device configuration layout ================================ struct virtio_iommu_config { u64 page_size_mask; struct virtio_iommu_range { u64 start; u64 end; } input_range; u8 ioasid_bits; }; IV. Device initialization ========================= 1. page_size_mask contains the bitmask of all page sizes that can be mapped. The least significant bit set defines the page granularity of IOMMU mappings. Other bits in the mask are hints describing page sizes that the IOMMU can merge into a single mapping (page blocks). There is no lower limit for the smallest page granularity supported by the IOMMU. It is legal for the driver to map one byte at a time if the device advertises it. page_size_mask must have at least one bit set. 2. If the VIRTIO_IOMMU_F_IOASID_BITS feature is negotiated, ioasid_bits contains the number of bits supported in an I/O Address Space ID, the identifier used in map/unmap requests. A value of 0 is valid, and means that a single address space is supported. If the feature is not negotiated, address space identifiers can use up to 32 bits. 3. If the VIRTIO_IOMMU_F_INPUT_RANGE feature is negotiated, input_range contains the virtual address range that the IOMMU is able to translate. Any mapping request to virtual addresses outside of this range will fail. If the feature is not negotiated, virtual mappings span over the whole 64-bit address space (start = 0, end = 0xffffffffffffffff) 4. If the VIRTIO_IOMMU_F_BYPASS feature is negotiated, devices behind the IOMMU not attached to an address space are allowed to access guest-physical addresses. Otherwise, accesses to guest-physical addresses may fault. V. Device operations ==================== Driver send requests on the request virtqueue (0), notifies the device and waits for the device to return the request with a status in the used ring. All requests are split in two parts: one device-readable, one device- writeable. Each request must therefore be described with at least two descriptors, as illustrated below. 31 7 0 +--------------------------------+ <------- RO descriptor | 0 (reserved) | type | +--------------------------------+ | | | payload | | | <------- WO descriptor +--------------------------------+ | 0 (reserved) | status | +--------------------------------+ struct virtio_iommu_req_head { u8 type; u8 reserved[3]; }; struct virtio_iommu_req_tail { u8 status; u8 reserved[3]; }; (Note on the format choice: this format forces the payload to be split in two - one read-only buffer, one write-only. It is necessary and sufficient for our purpose, and does not close the door to future extensions with more complex requests, such as a WO field sandwiched between two RO ones. With virtio 1.0 ring requirements, such a request would need to be described by two chains of descriptors, which might be more complex to implement efficiently, but still possible. Both devices and drivers must assume that requests are segmented anyway.) Type may be one of: VIRTIO_IOMMU_T_ATTACH 1 VIRTIO_IOMMU_T_DETACH 2 VIRTIO_IOMMU_T_MAP 3 VIRTIO_IOMMU_T_UNMAP 4 A few general-purpose status codes are defined here. Driver must not assume a specific status to be returned for an invalid request. Except for 0 that always means "success", these values are hints to make troubleshooting easier. VIRTIO_IOMMU_S_OK 0 All good! Carry on. VIRTIO_IOMMU_S_IOERR 1 Virtio communication error VIRTIO_IOMMU_S_UNSUPP 2 Unsupported request VIRTIO_IOMMU_S_DEVERR 3 Internal device error VIRTIO_IOMMU_S_INVAL 4 Invalid parameters VIRTIO_IOMMU_S_RANGE 5 Out-of-range parameters VIRTIO_IOMMU_S_NOENT 6 Entry not found VIRTIO_IOMMU_S_FAULT 7 Bad address 1. Attach device ---------------- struct virtio_iommu_req_attach { le32 address_space; le32 device; le32 flags/reserved; }; Attach a device to an address space. 'address_space' is an identifier unique to the guest. If the address space doesn't exist in the IOMMU device, it is created. 'device' is an identifier unique to the IOMMU. The host communicates unique device ID to the guest during boot. The method used to communicate this ID is outside the scope of this specification, but the following rules must apply: * The device ID is unique from the IOMMU point of view. Multiple devices whose DMA transactions are not translated by the same IOMMU may have the same device ID. Devices whose DMA transactions may be translated by the same IOMMU must have different device IDs. * Sometimes the host cannot completely isolate two devices from each others. For example on a legacy PCI bus, devices can snoop DMA transactions from their neighbours. In this case, the host must communicate to the guest that it cannot isolate these devices from each others. The method used to communicate this is outside the scope of this specification. The IOMMU device must ensure that devices that cannot be isolated by the host have the same address spaces. Multiple devices may be added to the same address space. A device cannot be attached to multiple address spaces (that is, with the map/unmap interface. For SVM, see page table and context table sharing proposal.) If the device is already attached to another address space 'old', it is detached from the old one and attached to the new one. The device cannot access mappings from the old address space after this request completes. The device either returns VIRTIO_IOMMU_S_OK, or an error status. We suggest the following error status, that would help debug the driver. NOENT: device not found. RANGE: address space is outside the range allowed by ioasid_bits. 2. Detach device ---------------- struct virtio_iommu_req_detach { le32 device; le32 flags/reserved; }; Detach a device from its address space. When this request completes, the device cannot access any mapping from that address space anymore. If the device isn't attached to any address space, the request returns successfully. After all devices have been successfully detached from an address space, its ID can be reused by the driver for another address space. NOENT: device not found. INVAL: device wasn't attached to any address space. 3. Map region ------------- struct virtio_iommu_req_map { le32 address_space; le64 phys_addr; le64 virt_addr; le64 size; le32 flags; }; VIRTIO_IOMMU_MAP_F_READ 0x1 VIRTIO_IOMMU_MAP_F_WRITE 0x2 VIRTIO_IOMMU_MAP_F_EXEC 0x4 Map a range of virtually-contiguous addresses to a range of physically-contiguous addresses. Size must always be a multiple of the page granularity negotiated during initialization. Both phys_addr and virt_addr must be aligned on the page granularity. The address space must have been created with VIRTIO_IOMMU_T_ATTACH. The range defined by (virt_addr, size) must be within the limits specified by input_range. The range defined by (phys_addr, size) must be within the guest-physical address space. This includes upper and lower limits, as well as any carving of guest-physical addresses for use by the host (for instance MSI doorbells). Guest physical boundaries are set by the host using a firmware mechanism outside the scope of this specification. (Note that this format prevents from creating the identity mapping in a single request (0x0 - 0xfff....fff) -> (0x0 - 0xfff...fff), since it would result in a size of zero. Hopefully allowing VIRTIO_IOMMU_F_BYPASS eliminates the need for issuing such request. It would also be unlikely to conform to the physical range restrictions from the previous paragraph) (Another note, on flags: it is unlikely that all possible combinations of flags will be supported by the physical IOMMU. For instance, (W & !R) or (E & W) might be invalid. I haven't taken time to devise a clever way to advertise supported and implicit (for instance "W implies R") flags or combination thereof for the moment, but I could at least try to research common models. Keeping in mind that we might soon want to add more flags, such as privileged, device, transient, shared, etc. whatever these would mean) This request is only available when VIRTIO_IOMMU_F_MAP_UNMAP has been negotiated. INVAL: invalid flags RANGE: virt_addr, phys_addr or range are not in the limits specified during negotiation. For instance, not aligned to page granularity. NOENT: address space not found. 4. Unmap region --------------- struct virtio_iommu_req_unmap { le32 address_space; le64 virt_addr; le64 size; le32 reserved; }; Unmap a range of addresses mapped with VIRTIO_IOMMU_T_MAP. The range, defined by virt_addr and size, must exactly cover one or more contiguous mappings created with MAP requests. All mappings covered by the range are removed. Driver should not send a request covering unmapped areas. We define a mapping as a virtual region created with a single MAP request. virt_addr should exactly match the start of an existing mapping. The end of the range, (virt_addr + size - 1), should exactly match the end of an existing mapping. Device must reject any request that would affect only part of a mapping. If the requested range spills outside of mapped regions, the device's behaviour is undefined. These rules are illustrated with the following requests (with arguments (va, size)), assuming each example sequence starts with a blank address space: map(0, 10) unmap(0, 10) -> allowed map(0, 5) map(5, 5) unmap(0, 10) -> allowed map(0, 10) unmap(0, 5) -> forbidden map(0, 10) unmap(0, 15) -> undefined map(0, 5) map(10, 5) unmap(0, 15) -> undefined (Note: the semantics of unmap are chosen to be compatible with VFIO's type1 v2 IOMMU API. This way a device serving as intermediary between guest and VFIO doesn't have to keep an internal tree of mappings. They are a bit tighter than VFIO, in that they don't allow unmap spilling outside mapped regions. Spilling is 'undefined' at the moment, because it should work in most cases but I don't know if it's worth the added complexity in devices that are not simply transmitting requests to VFIO. Splitting mappings won't ever be allowed, but see the relaxed proposal in 3/3 for more lenient semantics) This request is only available when VIRTIO_IOMMU_F_MAP_UNMAP has been negotiated. NOENT: address space not found. FAULT: mapping not found. RANGE: request would split a mapping. [VIRTIO-v1.0] Virtual I/O Device (VIRTIO) Version 1.0. 03 December 2013. Committee Specification Draft 01 / Public Review Draft 01. http://docs.oasis-open.org/virtio/virtio/v1.0/csprd01/virtio-v1.0-csprd01.html