Re: [RFC PATCH V2 2/2] vhost: device IOTLB API

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 04/27/2016 07:45 PM, Michael S. Tsirkin wrote:
> On Fri, Mar 25, 2016 at 10:34:34AM +0800, Jason Wang wrote:
>> This patch tries to implement an device IOTLB for vhost. This could be
>> used with for co-operation with userspace(qemu) implementation of DMA
>> remapping.
>>
>> The idea is simple. When vhost meets an IOTLB miss, it will request
>> the assistance of userspace to do the translation, this is done
>> through:
>>
>> - Fill the translation request in a preset userspace address (This
>>   address is set through ioctl VHOST_SET_IOTLB_REQUEST_ENTRY).
>> - Notify userspace through eventfd (This eventfd was set through ioctl
>>   VHOST_SET_IOTLB_FD).
> Why use an eventfd for this?

The aim is to implement the API all through ioctls.

>  We use them for interrupts because
> that happens to be what kvm wants, but here - why don't we
> just add a generic support for reading out events
> on the vhost fd itself?

I've considered this approach, but what's the advantages of this? I mean
looks like all other ioctls could be done through vhost fd
reading/writing too.

>
>> - device IOTLB were started and stopped through VHOST_RUN_IOTLB ioctl
>>
>> When userspace finishes the translation, it will update the vhost
>> IOTLB through VHOST_UPDATE_IOTLB ioctl. Userspace is also in charge of
>> snooping the IOTLB invalidation of IOMMU IOTLB and use
>> VHOST_UPDATE_IOTLB to invalidate the possible entry in vhost.
> There's one problem here, and that is that VQs still do not undergo
> translation.  In theory VQ could be mapped in such a way
> that it's not contigious in userspace memory.

I'm not sure I get the issue, current vhost API support setting
desc_user_addr, used_user_addr and avail_user_addr independently. So
looks ok? If not, looks not a problem to device IOTLB API itself.

>
>
>> Signed-off-by: Jason Wang <jasowang@xxxxxxxxxx>
> What limits amount of entries that kernel keeps around?

It depends on guest working set I think. Looking at
http://dpdk.org/doc/guides/linux_gsg/sys_reqs.html:

- For 2MB page size in guest, it suggests hugepages=1024
- For 1GB page size, it suggests a hugepages=4

So I choose 2048 to make sure it can cover this.

> Do we want at least a mod parameter for this?

Maybe.

>
>> ---
>>  drivers/vhost/net.c        |   6 +-
>>  drivers/vhost/vhost.c      | 301 +++++++++++++++++++++++++++++++++++++++------
>>  drivers/vhost/vhost.h      |  17 ++-
>>  fs/eventfd.c               |   3 +-
>>  include/uapi/linux/vhost.h |  35 ++++++
>>  5 files changed, 320 insertions(+), 42 deletions(-)
>>

[...]

>> +struct vhost_iotlb_entry {
>> +	__u64 iova;
>> +	__u64 size;
>> +	__u64 userspace_addr;
> Alignment requirements?

The API does not require any alignment. Will add a comment for this.

>
>> +	struct {
>> +#define VHOST_ACCESS_RO      0x1
>> +#define VHOST_ACCESS_WO      0x2
>> +#define VHOST_ACCESS_RW      0x3
>> +		__u8  perm;
>> +#define VHOST_IOTLB_MISS           1
>> +#define VHOST_IOTLB_UPDATE         2
>> +#define VHOST_IOTLB_INVALIDATE     3
>> +		__u8  type;
>> +#define VHOST_IOTLB_INVALID        0x1
>> +#define VHOST_IOTLB_VALID          0x2
>> +		__u8  valid;
> why do we need this flag?

Useless, will remove.

>
>> +		__u8  u8_padding;
>> +		__u32 padding;
>> +	} flags;
>> +};
>> +
>> +struct vhost_vring_iotlb_entry {
>> +	unsigned int index;
>> +	__u64 userspace_addr;
>> +};
>> +
>>  struct vhost_memory_region {
>>  	__u64 guest_phys_addr;
>>  	__u64 memory_size; /* bytes */
>> @@ -127,6 +153,15 @@ struct vhost_memory {
>>  /* Set eventfd to signal an error */
>>  #define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct vhost_vring_file)
>>  
>> +/* IOTLB */
>> +/* Specify an eventfd file descriptor to signle on IOTLB miss */
> typo

Will fix it.

>
>> +#define VHOST_SET_VRING_IOTLB_CALL _IOW(VHOST_VIRTIO, 0x23, struct      \
>> +                                        vhost_vring_file)
>> +#define VHOST_SET_VRING_IOTLB_REQUEST _IOW(VHOST_VIRTIO, 0x25, struct   \
>> +                                           vhost_vring_iotlb_entry)
>> +#define VHOST_UPDATE_IOTLB _IOW(VHOST_VIRTIO, 0x24, struct vhost_iotlb_entry)
>> +#define VHOST_RUN_IOTLB _IOW(VHOST_VIRTIO, 0x26, int)
>> +
> Is the assumption that userspace must dedicate a thread to running the iotlb? 
> I dislike this one.
> Please support asynchronous APIs at least optionally to make
> userspace make its own threading decisions.

Nope, my qemu patches does not use a dedicated thread. This API is used
to start or top DMAR according to e.g whether guest enable DMAR for
intel IOMMU.

>
>>  /* VHOST_NET specific defines */
>>  
>>  /* Attach virtio net ring to a raw socket, or tap device.
> Don't we need a feature bit for this?

Yes we need it. The feature bit is not considered in this patch and
looks like it was still under discussion. After we finalize it, I will add.

> Are we short on feature bits? If yes maybe it's time to add
> something like PROTOCOL_FEATURES that we have in vhost-user.
>

I believe it can just work like VERSION_1, or is there anything I missed?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux