Re: MMIO/PIO dispatch file descriptors (ioregionfd) design discussion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2020/11/26 下午8:36, Stefan Hajnoczi wrote:
On Thu, Nov 26, 2020 at 11:37:30AM +0800, Jason Wang wrote:
On 2020/11/26 上午3:21, Elena Afanasova wrote:
Hello,

I'm an Outreachy intern with QEMU and I’m working on implementing the
ioregionfd API in KVM.
So I’d like to resume the ioregionfd design discussion. The latest
version of the ioregionfd API document is provided below.

Overview
--------
ioregionfd is a KVM dispatch mechanism for handling MMIO/PIO accesses
over a
file descriptor without returning from ioctl(KVM_RUN). This allows device
emulation to run in another task separate from the vCPU task.

This is achieved through KVM ioctls for registering MMIO/PIO regions and
a wire
protocol that KVM uses to communicate with a task handling an MMIO/PIO
access.

The traditional ioctl(KVM_RUN) dispatch mechanism with device emulation
in a
separate task looks like this:

    kvm.ko  <---ioctl(KVM_RUN)---> VMM vCPU task <---messages---> device
task

ioregionfd improves performance by eliminating the need for the vCPU
task to
forward MMIO/PIO exits to device emulation tasks:

I wonder at which cases we care performance like this. (Note that vhost-user
suppots set|get_config() for a while).
NVMe emulation needs this because ioeventfd cannot transfer the value
written to the doorbell. That's why QEMU's NVMe emulation doesn't
support IOThreads.


I think it depends on how many different value that can be carried via doorbell. If it's not tons of, we can use datamatch. Anyway virtio support differing queue index via the value wrote to doorbell.



KVM_CREATE_IOREGIONFD
---------------------
:Capability: KVM_CAP_IOREGIONFD
:Architectures: all
:Type: system ioctl
:Parameters: none
:Returns: an ioregionfd file descriptor, -1 on error

This ioctl creates a new ioregionfd and returns the file descriptor. The
fd can
be used to handle MMIO/PIO accesses instead of returning from
ioctl(KVM_RUN)
with KVM_EXIT_MMIO or KVM_EXIT_PIO. One or more MMIO or PIO regions must
be
registered with KVM_SET_IOREGION in order to receive MMIO/PIO accesses
on the
fd. An ioregionfd can be used with multiple VMs and its lifecycle is not
tied
to a specific VM.

When the last file descriptor for an ioregionfd is closed, all regions
registered with KVM_SET_IOREGION are dropped and guest accesses to those
regions cause ioctl(KVM_RUN) to return again.

I may miss something, but I don't see any special requirement of this fd.
The fd just a transport of a protocol between KVM and userspace process. So
instead of mandating a new type, it might be better to allow any type of fd
to be attached. (E.g pipe or socket).
pipe(2) is unidirectional on Linux, so it won't work.


Can we accept two file descriptors to make it work?



mkfifo(3) seems usable but creates a node on a filesystem.

socketpair(2) would work, but brings in the network stack when it's not
needed. The advantage is that some future user case might want to direct
ioregionfd over a real socket to a remote host, which would be cool.

Do you have an idea of the performance difference of socketpair(2)
compared to a custom fd?


It should be slower than custom fd and UNIX socket should be faster than TIPC. Maybe we can have a custom fd, but it's better to leave the policy to the userspace:

1) KVM should not have any limitation of the fd it uses, user will risk itself if the fd has been used wrongly, and the custom fd should be one of the choice
2) it's better to not have a virt specific name (e.g "KVM" or "ioregion")

Or I wonder whether we can attach an eBPF program when trapping MMIO/PIO and allow it to decide how to proceed?

Thanks



If it's neglible then using an arbitrary socket is more flexible and
sounds good.

Stefan




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux