On Sat, Dec 21, 2024 at 9:44 PM Stefan Hajnoczi <stefanha@xxxxxxxxx> wrote: > > On Fri, 20 Dec 2024 at 10:56, Timos Ampelikiotis > <t.ampelikiotis@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > > > On Wed, Dec 4, 2024 at 7:18 PM Stefan Hajnoczi <stefanha@xxxxxxxxx> wrote: > >> > >> On Wed, 4 Dec 2024 at 10:38, <t.ampelikiotis@xxxxxxxxxxxxxxxxxxxxxx> wrote: > >> > > >> > From: Timos Ampelikiotis <t.ampelikiotis@xxxxxxxxxxxxxxxxxxxxxx> > >> > > >> > This commit, is based on virtio MMIO driver, adds support > >> > for dynamic allocated (platform) virtio devices. This > >> > allows applications running in native environments to use > >> > virtio drivers as a HAL and eventually communicate with > >> > user-space drivers (implementing the vhost-user protocol). > >> > [...] > > > > > > I will provide a brief description of how the memory mapping > > mechanism works and set up between the virtio-transport and > > the vhost-user devices and then comment on the security aspect. > > > > The adapter application sends a set of memory regions to the > > vhost-user device which are essentially sets of [FD_1, HPA_1, SIZE], > > [FD_2, HPA_2, SIZE], ... [FD_N, HPA_N, SIZE]. The SIZE is 1 GB and > > eventually HPA_N + SIZE should be bigger than the RAM size. > > > > When vhost-user receives those regions it calls the MMAP sys_call > > for each one of those FDs individually and the driver assigns a page > > handler to all of those VM regions and returns. The vhost-user device > > at the end of this process generated the following mappings: > > [VA_1, HPA_1, SIZE_1], [VA_2, HPA_2, SIZE_2], ..., [VA_N, HPA_N, SIZE_N]. > > > > That more or less is the initialization process which will make sense > > right after we describe how the driver give access for data buffers to the > > vhost-user device below. > > > > When the vhost-user device tries to access a VA_X obtained by those > > mmaps, a page-fault will occur and then the driver will need to > > provide the corresponding page. At that point, the transport driver > > goes through the records, recognise that VA_X is in [VA_Y, VA_Y + SIZE) > > and constructs the required PFN by doing HPA_Y = VA_Y + offset where > > offset = VA_X - VA_Y. Finally the driver inserts the page to the running > > process cause the page-fault with "vmf_insert_page". > > > > Security wise this approach has to fulfill the following two > > requirements: > > > > a) The first one is that we need to check efficiently if > > the calling process (vhost-user device) can have access to > > the requested pages. For example, assuming that we run a virtio > > blk on top of virtio-loopback transport the vhost-user blk > > should be able to access only pages which are related with > > virtio-blk virtqueues and vrings. That challenge is solvable > > and we can add those checks without changing the architecture. > > Once pages have been faulted in, how does the kernel revoke access > when the request is complete? I'm thinking of the scenario where a > page is used for virtio DMA but then later reused for other purposes. > Userspace must not retain access once the page is reused for non-DMA > purposes. I guess it requires madvise(MADV_DONTNEED) or something similar. > > > > > b) The second one is the fact that the current memory mapping > > approach cannot guarantee safety for the remaining data on a page > > when the virtio buffers do not cover the whole size of it. > > In order to address that security issue, we should implement > > an approach such the one proposed by Jason Wang below: > > https://github.com/jasowang/net/tree/vduse-zerocopy > > > > In this way we utilize the bouncing buffer idea and we can > > guarantee to not expose the remaining data when buffer_size < > > PAGE_SIZE and insert the page into the process otherwise. > > > > Overall, we are aware of those two security points, the first > > one will require the implementation of additional checks and > > the second the modification of data sharing model to avoid > > exposing kernel data. > > > > If you think that the overall approach is interesting we > > continue the discussion and target any future work to address > > the above described challenges. > > With regards to the overall approach, please work with Jason since > there is an overlap with VDUSE. If you guys agree on how VDUSE and > loopback fit together, then I'm happy. > Right, I think it would be better if we can find a way in VDUSE first as a lot of works/codes could be reused. Thanks