On Mon, Feb 6, 2023 at 6:22 PM Stefan Hajnoczi <stefanha@xxxxxxxxx> wrote: > > On Mon, 6 Feb 2023 at 11:47, Eugenio Perez Martin <eperezma@xxxxxxxxxx> wrote: > > > > On Mon, Feb 6, 2023 at 3:21 PM Stefan Hajnoczi <stefanha@xxxxxxxxx> wrote: > > > > > > On Mon, 6 Feb 2023 at 06:53, Eugenio Perez Martin <eperezma@xxxxxxxxxx> wrote: > > > > > > > > On Sun, Feb 5, 2023 at 2:57 PM Stefan Hajnoczi <stefanha@xxxxxxxxx> wrote: > > > > > > > > > > On Sun, 5 Feb 2023 at 03:15, Eugenio Perez Martin <eperezma@xxxxxxxxxx> wrote: > > > > > > > > > > > > On Fri, Jan 27, 2023 at 4:18 PM Stefan Hajnoczi <stefanha@xxxxxxxxx> wrote: > > > > > > > > > > > > > > Dear QEMU, KVM, and rust-vmm communities, > > > > > > > QEMU will apply for Google Summer of Code 2023 > > > > > > > (https://summerofcode.withgoogle.com/) and has been accepted into > > > > > > > Outreachy May 2023 (https://www.outreachy.org/). You can now > > > > > > > submit internship project ideas for QEMU, KVM, and rust-vmm! > > > > > > > > > > > > > > Please reply to this email by February 6th with your project ideas. > > > > > > > > > > > > > > If you have experience contributing to QEMU, KVM, or rust-vmm you can > > > > > > > be a mentor. Mentors support interns as they work on their project. It's a > > > > > > > great way to give back and you get to work with people who are just > > > > > > > starting out in open source. > > > > > > > > > > > > > > Good project ideas are suitable for remote work by a competent > > > > > > > programmer who is not yet familiar with the codebase. In > > > > > > > addition, they are: > > > > > > > - Well-defined - the scope is clear > > > > > > > - Self-contained - there are few dependencies > > > > > > > - Uncontroversial - they are acceptable to the community > > > > > > > - Incremental - they produce deliverables along the way > > > > > > > > > > > > > > Feel free to post ideas even if you are unable to mentor the project. > > > > > > > It doesn't hurt to share the idea! > > > > > > > > > > > > > > I will review project ideas and keep you up-to-date on QEMU's > > > > > > > acceptance into GSoC. > > > > > > > > > > > > > > Internship program details: > > > > > > > - Paid, remote work open source internships > > > > > > > - GSoC projects are 175 or 350 hours, Outreachy projects are 30 > > > > > > > hrs/week for 12 weeks > > > > > > > - Mentored by volunteers from QEMU, KVM, and rust-vmm > > > > > > > - Mentors typically spend at least 5 hours per week during the coding period > > > > > > > > > > > > > > For more background on QEMU internships, check out this video: > > > > > > > https://www.youtube.com/watch?v=xNVCX7YMUL8 > > > > > > > > > > > > > > Please let me know if you have any questions! > > > > > > > > > > > > > > Stefan > > > > > > > > > > > > > > > > > > > Appending the different ideas here. > > > > > > > > > > Hi Eugenio, > > > > > Thanks for sharing your project ideas. I have added some questions > > > > > below before we add them to the ideas list wiki page. > > > > > > Thanks for the discussion. Do you want to focus on 1 or 2 project > > > ideas? 3 might be a bit much to mentor. > > > > > > > Right, my idea was to reduce that amount afterwards just in case some > > of them were rejected. But sure, we can filter out some if needed. > > Do you mean in case there is no realistic applicant? You can do that > if you want, just keep in mind it may be more work for you during the > application phase. If it turns out there is a strong applicant for > each project idea you could see if someone else is willing to mentor > the project(s) you don't have time for. > Good point, I'll discard the IN_ORDER project from the list. > I'll post the project ideas once you've updated them. > > > > Please send an updated version of the project descriptions and I'll > > > post it on the wiki. > > > > > > > > > > > > > > VIRTIO_F_IN_ORDER feature support for virtio devices > > > > > > === > > > > > > This was already a project the last year, and it produced a few series > > > > > > upstream but was never merged. The previous series are totally useful > > > > > > to start with, so it's not starting from scratch with them [1]: > > > > > > > > > > Has Zhi Guo stopped working on the patches? > > > > > > > > > > > > > I can ask him for sure. > > > > > > > > > What is the state of the existing patches? What work remains to be done? > > > > > > > > > > > > > There are some pending comments from upstream. However if somebody > > > > starts it from scratch it needs time to review some of the VirtIO > > > > standard to understand the virtio in_order feature, both in split and > > > > packed vq. > > > > > > The intern will need to take ownership and deal with code review > > > feedback for code they didn't write. That can be difficult for someone > > > who is new unless the requested changes are easy to address. > > > > > > > Indeed that is a very good point. > > > > > It's okay to start from scratch. You're in a better position than an > > > applicant to decide whether that's the best approach. > > > > > > > > > > > > > > > > > > > > > > > Summary > > > > > > --- > > > > > > Implement VIRTIO_F_IN_ORDER in QEMU and Linux (vhost and virtio drivers) > > > > > > > > > > > > The VIRTIO specification defines a feature bit (VIRTIO_F_IN_ORDER) > > > > > > that devices and drivers can negotiate when the device uses > > > > > > descriptors in the same order in which they were made available by the > > > > > > driver. > > > > > > > > > > > > This feature can simplify device and driver implementations and > > > > > > increase performance. For example, when VIRTIO_F_IN_ORDER is > > > > > > negotiated, it may be easier to create a batch of buffers and reduce > > > > > > DMA transactions when the device uses a batch of buffers. > > > > > > > > > > > > Currently the devices and drivers available in Linux and QEMU do not > > > > > > support this feature. An implementation is available in DPDK for the > > > > > > virtio-net driver. > > > > > > > > > > > > Goals > > > > > > --- > > > > > > Implement VIRTIO_F_IN_ORDER for a single device/driver in QEMU and > > > > > > Linux (virtio-net or virtio-serial are good starting points). > > > > > > Generalize your approach to the common virtio core code for split and > > > > > > packed virtqueue layouts. > > > > > > If time allows, support for the packed virtqueue layout can be added > > > > > > to Linux vhost, QEMU's libvhost-user, and/or QEMU's virtio qtest code. > > > > > > > > > > > > Shadow Virtqueue missing virtio features > > > > > > === > > > > > > > > > > > > Summary > > > > > > --- > > > > > > Some VirtIO devices like virtio-net have a control virtqueue (CVQ) > > > > > > that allows them to dynamically change a number of parameters like MAC > > > > > > or number of active queues. Changes to passthrough devices using vDPA > > > > > > using CVQ are inherently hard to track if CVQ is handled as > > > > > > passthrough data queues, because qemu is not aware of that > > > > > > communication for performance reasons. In this situation, qemu is not > > > > > > able to migrate these devices, as it is not able to tell the actual > > > > > > state of the device. > > > > > > > > > > > > Shadow Virtqueue (SVQ) allows qemu to offer an emulated queue to the > > > > > > device, effectively forwarding the descriptors of that communication, > > > > > > tracking the device internal state, and being able to migrate it to a > > > > > > new destination qemu. > > > > > > > > > > > > To restore that state in the destination, SVQ is able to send these > > > > > > messages as regular CVQ commands. The code to understand and parse > > > > > > virtio-net CVQ commands is already in qemu as part of its emulated > > > > > > device, but the code to send the some of the new state is not, and > > > > > > some features are missing. There is already code to restore basic > > > > > > commands like mac or multiqueue, and it is easy to use it as a > > > > > > template. > > > > > > > > > > > > Goals > > > > > > --- > > > > > > To implement missing virtio-net commands sending: > > > > > > * VIRTIO_NET_CTRL_RX family, to control receive mode. > > > > > > * VIRTIO_NET_CTRL_GUEST_OFFLOADS > > > > > > * VIRTIO_NET_CTRL_VLAN family > > > > > > * VIRTIO_NET_CTRL_MQ_HASH config > > > > > > * VIRTIO_NET_CTRL_MQ_RSS config > > > > > > > > > > Is there enough work here for a 350 hour or 175 hour GSoC project? > > > > > > > > > > > > > I think 175 hour should fit better. If needed more features can be > > > > added (packed vq, ring reset, etc), but to start contributing a 175 > > > > hour should work. > > > > > > > > > The project description mentions "there is already code to restore > > > > > basic commands like mac and multiqueue", please include a link. > > > > > > > > > > > > > MAC address was merged with ASID support so the whole series is more > > > > complicated than it should be. Here is it the most relevant patch: > > > > * https://lists.gnu.org/archive/html/qemu-devel/2022-09/msg00342.html > > > > > > > > MQ is way cleaner in that regard, and future series should look more > > > > similar to this one: > > > > * https://www.mail-archive.com/qemu-devel@xxxxxxxxxx/msg906273.html > > > > > > > > > > Shadow Virtqueue performance optimization > > > > > > === > > > > > > Summary > > > > > > --- > > > > > > To perform a virtual machine live migration with an external device to > > > > > > qemu, qemu needs a way to know which memory the device modifies so it > > > > > > is able to resend it. Otherwise the guest would resume with invalid / > > > > > > outdated memory in the destination. > > > > > > > > > > > > This is especially hard with passthrough hardware devices, as > > > > > > transports like PCI imposes a few security and performance challenges. > > > > > > As a method to overcome this for virtio devices, qemu can offer an > > > > > > emulated virtqueue to the device, called Shadow Virtqueue (SVQ), > > > > > > instead of allowing the device to communicate directly with the guest. > > > > > > SVQ will then forward the writes to the guest, being the effective > > > > > > writer in the guest memory and knowing when a portion of it needs to > > > > > > be resent. > > > > > > > > > > > > As this is effectively breaking the passthrough and it adds extra > > > > > > steps in the communication, this comes with a performance penalty in > > > > > > some forms: Context switches, more memory reads and writes increasing > > > > > > cache pressure, etc. > > > > > > > > > > > > At this moment the SVQ code is not optimized. It cannot forward > > > > > > buffers in parallel using multiqueue and multithread, and it does not > > > > > > use posted interrupts to notify the device skipping the host kernel > > > > > > context switch (doorbells). > > > > > > > > > > > > The SVQ code requires minimal modifications for the multithreading, > > > > > > and these are examples of multithreaded devices already like > > > > > > virtio-blk which can be used as a template-alike. Regarding the posted > > > > > > interrupts, DPDK is able to use them so that code can also be used as > > > > > > a template. > > > > > > > > > > > > Goals > > > > > > --- > > > > > > * Measure the latest SVQ performance compared to non-SVQ. > > > > > > > > > > Which benchmark workload and which benchmarking tool do you recommend? > > > > > Someone unfamiliar with QEMU and SVQ needs more details in order to > > > > > know what to do. > > > > > > > > > > > > > In my opinion netperf (TCP_STREAM & TCP_RR) or iperf equivalent + > > > > testpmd in AF_PACKET mode should test these scenarios better. But > > > > maybe upstream requests additional testings. Feedback on this would be > > > > appreciated actually. > > > > > > > > My intention is not for the intern to develop new tests or anything > > > > like that, they are just a means to justify the changes in SVQ. This > > > > part would be very guided, or it can be offloaded from the project. So > > > > if these tools are not enough descriptive maybe it's better to take > > > > this out of the goals and add it to the description like that. > > > > > > Great, "netperf (TCP_STREAM & TCP_RR) or iperf equivalent + testpmd in > > > AF_PACKET mode" is enough information. > > > > > > > > > > > > > * Add multithreading to SVQ, extracting the code from the Big QEMU Lock (BQL). > > > > > > > > > > What do you have in mind? Allowing individual virtqueues to be > > > > > assigned to IOThreads? Or processing all virtqueues in a single > > > > > IOThread (like virtio-blk and virtio-scsi do today)? > > > > > > > > > > > > > My idea was to use iothreads. I thought virtio-blk and virtio-scsi > > > > were done that way actually, is there a reason / advantage to use just > > > > a single iothread? > > > > > > The reason for only supporting a single IOThread at the moment is > > > thread-safety. There is multi-queue work in progress that will remove > > > this limitation in the future. > > > > > > I sent a patch series proposing a command-line syntax for multi-queue here: > > > https://www.mail-archive.com/qemu-devel@xxxxxxxxxx/msg933001.html > > > > > > The idea is that the same syntax can be used by other devices that > > > support mapping vqs to multiple IOThreads. > > > > > > > Understood. I'll take a look, thanks! > > > > > > > > > > > > * Add posted thread capabilities to QEMU, following the model of DPDK to it. > > > > > > > > > > What is this about? I thought KVM uses posted interrupts when > > > > > available, so what needs to be done here? Please also include a link > > > > > to the relevant DPDK code. > > > > > > > > > > > > > The guest in KVM may use posted interrupts but SVQ code runs in > > > > userland qemu :). There were no previous uses of HW posted interrupts > > > > as far as I know so SVQ is only able to use vhost-vdpa kick eventfds > > > > to notify queues. This has a performance penalty in the form of host > > > > kernel context switches. > > > > > > > > If I'm not wrong this patch adds it to DPDK, but I may be missing > > > > additional context or versions: > > > > * https://lore.kernel.org/all/1579539790-3882-31-git-send-email-matan@xxxxxxxxxxxx/ > > > > > > > > Please let me know if you need further information. Thanks! > > > > > > This patch does not appear related to posted interrupts because it's > > > using the kickfd (available buffer notification) instead of the callfd > > > (used buffer notification). It's the glue that forwards a virtqueue > > > kick to hardware. > > > > > > > I'm sorry, that's because I confused the terms in my head and I wanted > > to say "host notifiers memory regions" or "hardware doorbell mapping". > > Maybe it is clearer that way? > > The VIRTIO spec calls this memory the Queue Notify address. > > > > > > I don't think that userspace available buffer notification > > > interception can be bypassed in the SVQ model. SVQ needs to take a > > > copy of available buffers so it knows the scatter-gather lists before > > > forwarding the kick to the vDPA device. If the notification is > > > bypassed then SVQ cannot reliably capture the scatter-gather list. > > > > > > I also don't think it's possible to bypass userspace in the used > > > buffer notification path. The vDPA used buffer notification must be > > > intercepted so SVQ can mark memory pages in the scatter-gather list > > > dirty before it fills in a guest used buffer and sends a guest used > > > buffer notification. > > > > > > The guest used buffer notification should already be a VT-d Posted > > > Interrupt on hardware that supports the feature. KVM takes care of > > > that. > > > > > > I probably don't understand what the optimization idea is. You want > > > SVQ to avoid a system call when sending vDPA available buffer > > > notifications? That's not related to posted interrupts though, so I'm > > > confused... > > > > > > > That's right, you described the idea perfectly that way :). I'll > > complete the projects summary but I'll be ok if you think it is not > > qualified, we can leave that part out of the proposal. > > Thanks, I think I get it now. The task is to implement the dual of > QEMU's virtio_queue_set_host_notifier_mr() so SVQ can perform > virtqueue kicks on the vDPA device via memory store instructions. > > That's a cool feature and I think it should be included in the project idea. > > Stefan > Thanks for all the feedback, it makes the proposal way clearer. I add the updated proposals here, please let me know if you think they need further modifications. Shadow Virtqueue missing virtio features === Summary --- Some VirtIO devices like virtio-net have a control virtqueue (CVQ) that allows them to dynamically change a number of parameters like MAC or number of active queues. Changes to passthrough devices using vDPA using CVQ are inherently hard to track if CVQ is handled as passthrough data queues, because qemu is not aware of that communication for performance reasons. In this situation, qemu is not able to migrate these devices, as it is not able to tell the actual state of the device. Shadow Virtqueue (SVQ) allows qemu to offer an emulated queue to the device, effectively forwarding the descriptors of that communication, tracking the device internal state, and being able to migrate it to a new destination qemu. To restore that state in the destination, SVQ is able to send these messages as regular CVQ commands. The code to understand and parse virtio-net CVQ commands is already in qemu as part of its emulated device, but the code to send some of the new state is not, and some features are missing. There is already code to restore basic commands like mac [1] or multiqueue [2], and it is easy to use them as a template. [1] https://lists.gnu.org/archive/html/qemu-devel/2022-09/msg00342.html [2] https://www.mail-archive.com/qemu-devel@xxxxxxxxxx/msg906273.html Goals --- To implement missing virtio-net commands sending: * VIRTIO_NET_CTRL_RX family, to control receive mode. * VIRTIO_NET_CTRL_GUEST_OFFLOADS * VIRTIO_NET_CTRL_VLAN family * VIRTIO_NET_CTRL_MQ_HASH config * VIRTIO_NET_CTRL_MQ_RSS config Shadow Virtqueue performance optimization === Summary --- To perform a virtual machine live migration with an external device to qemu, qemu needs a way to know which memory the device modifies so it is able to resend it. Otherwise the guest would resume with invalid / outdated memory in the destination. This is especially hard with passthrough hardware devices, as transports like PCI imposes a few security and performance challenges. As a method to overcome this for virtio devices, qemu can offer an emulated virtqueue to the device, called Shadow Virtqueue (SVQ), instead of allowing the device to communicate directly with the guest. SVQ will then forward the writes to the guest, being the effective writer in the guest memory and knowing when a portion of it needs to be resent. As this is effectively breaking the passthrough and it adds extra steps in the communication, this comes with a performance penalty in some forms: Context switches, more memory reads and writes increasing cache pressure, etc. At this moment the SVQ code is not optimized. It cannot forward buffers in parallel using multiqueue and multithread, and it does not use the Queue Notify address to notify the device for available buffers, so these notifications needs to perform an extra host kernel context switch. The SVQ code requires minimal modifications for the multithreading, and these are examples of multithreaded devices already like virtio-blk which can be used as a template-alike. Proposals about the cmdline syntax of mapping virtio queues to iothread using the qemu command line have been sent to qemu mail list already [1]. Regarding the use of Queue Notify address, DPDK is able to use them so that code can also be used as a template. [1] https://www.mail-archive.com/qemu-devel@xxxxxxxxxx/msg933001.html Goals --- * Measure the latest SVQ performance compared to non-SVQ with standardized profiling tools like netperf (TCP_STREAM & TCP_RR) or iperf equivalent + DPDK's testpmd in AF_PACKET. * Add multithreading to SVQ, extracting the code from the Big QEMU Lock (BQL) * Add Queue Notify write capabilities to QEMU, following the model of DPDK to it.