On Fri, Dec 20, 2024 at 11:49 PM Timos Ampelikiotis <t.ampelikiotis@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > > > On Thu, Dec 5, 2024 at 10:17 AM Jason Wang <jasowang@xxxxxxxxxx> wrote: >> >> Adding Eugenio and YongJi. >> >> On Wed, Dec 4, 2024 at 11:38 PM <t.ampelikiotis@xxxxxxxxxxxxxxxxxxxxxx> wrote: >> > >> > From: Timos Ampelikiotis <t.ampelikiotis@xxxxxxxxxxxxxxxxxxxxxx> >> > >> > We would like to share with you an RFC for the Virtio-loopback >> > technology which we have been working on at Virtual Open Systems in >> > the context of the Automotive Grade Linux community (Software defined >> > Vehicles expert group) >> > >> > We previously presented this activity (see [1]) and now we come back >> > to you with latest development and updates. >> > >> > We believe that the technology is more mature today and we would like >> > to assess the community interest in the technology itself and in >> > merging the code. >> > >> > Below we provide a brief description of the technology, recent >> > updates and a short comparison with vDUSE that might be seen as a >> > similar technology. >> > >> > 1. Overview: >> > ------- >> > >> > Virtio-loopback is a hardware abstraction layer (HAL) designed for >> > non-virtualized environments based on virtio. The main objective is >> > to enable applications communication with vhost-user devices in a >> > non-virtualized environment. >> > >> > More in details, Virtio-loopback architecture consists of a new >> > transport (Virtio-loopback), a user-space application (Adapter) and >> > the vhost-user devices. >> > >> > The data path has been implemented using the "zero-copy" principle, >> >> This need more clarification, for example, how could we prevent >> malicious usersapce device from modifying kernel pages etc. Especially >> consider not all buffer occupy full pages. >> >> Actually, after chatting with Yong Ji, I've played with a zerocopy POC >> for VDUSE, but it tries to do zerocopy only for page aligned buffer: >> >> https://github.com/jasowang/net/tree/vduse-zerocopy >> >> The idea is to map the page directly to the userspace if the buffer >> occupies a full page and usersapce is expected to recycle the page via >> MADV_DONTNEED. It's far from mature but it can demonstrate the idea >> somehow. > > > I had a quick look into the POC and it seems like it improves > performance (by reducing the copies) without compromising the security. Note that it has something left: e.g what if userspace doesn't "recycle" pages via MADV_DONTNEED etc. > > In general, it is similar to what we propose (though we focused on > performance mainly). A brief introduction of how data are shared with > the vhost-user devices without copies follows: > > In virtio-loopback, a page handler is assigned to the kernel memory > when a vhost-user device tries to mmap a new memory region. Any time > then the vhost-user device tries to actually access a page inside > that memory region, a page-fault is generated and the driver decides > which is the corresponding page to be inserted to the vhost-user > process. Note that the page fault is really expensive, that's in my POC I hacked the VDUSE to map the pages to avoid #PF. But it needs a concept like "owner" so VDUSE can grab the mm_struct of the owner process to do that. > Before sharing the page, we have intentionally left some > space for security checks to be implemented in the future. > At that point we can check if the page requested is related to the > corresponding device and if not then do not share it with the user-space. > > As correctly said, that solution does not address the case of buffers > being smaller than a page, and indeed, there is the chance for malicious > applications to take advantage of the data padding the page after the buffer. > > Solving that issue would lead us to the solution you shared above. More > specifically creating a bouncing buffer approach for those cases. I didn't > have the chance to benchmark the above POC yet, but I believe that it should > introduce a performance advantage over the upstream vDUSE solution. I had some tests through qsd. I can get at most 40% improvement when I'm using 128K write etc. But when I was doing POC with OVS-DPDK, I didn't get good performance as it involved too much madvise() per packet. I'm trying to seek a good way to reduce the call to madvise. > > I understand that security is a very important point here. > So my first objectives for the virtio-loopback future development > would be first to address all the security concerns about the > technology. > > Before that, I would like to assess your interest in the technology > and understand: > a) if the community would be interested to merge a new virtio-transport > which does something similar with vDUSE but does not depend on vDPA? If there's an advantage of bypassing vDPA, I would like to know. Since I'm seeking a way to accelerate VDUSE with zerocopy and the work was kind of duplicated. > b) if it would be interesting to make the adapter compatible with vDUSE > in order to add support for more devices in the current vDUSE > implementation? That would be welcomed. Thanks