Re: [PATCH 2/2][RFC] KVM: Emulate MSI-X table and PBA in kernel

Sheng Yang <sheng@xxxxxxxxxxxxxxx> · Thu, 30 Dec 2010 17:13:07 +0800

On Thursday 30 December 2010 16:52:58 Michael S. Tsirkin wrote:
> On Thu, Dec 30, 2010 at 04:24:10PM +0800, Sheng Yang wrote:
> > On Thursday 30 December 2010 16:15:32 Michael S. Tsirkin wrote:
> > > On Thu, Dec 30, 2010 at 03:55:10PM +0800, Sheng Yang wrote:
> > > > On Thursday 30 December 2010 15:47:48 Michael S. Tsirkin wrote:
> > > > > On Thu, Dec 30, 2010 at 03:32:42PM +0800, Sheng Yang wrote:
> > > > > > On Wednesday 29 December 2010 17:28:24 Michael S. Tsirkin wrote:
> > > > > > > On Wed, Dec 29, 2010 at 04:55:19PM +0800, Sheng Yang wrote:
> > > > > > > > On Wednesday 29 December 2010 16:31:35 Michael S. Tsirkin wrote:
> > > > > > > > > On Wed, Dec 29, 2010 at 03:18:13PM +0800, Sheng Yang wrote:
> > > > > > > > > > On Tuesday 28 December 2010 20:26:13 Avi Kivity wrote:
> > > > > > > > > > > On 12/22/2010 10:44 AM, Sheng Yang wrote:
> > > > > > > > > > > > Then we can support mask bit operation of assigned
> > > > > > > > > > > > devices now.
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > @@ -3817,14 +3819,16 @@ static int
> > > > > > > > > > > > emulator_write_emulated_onepage(unsigned long addr,
> > > > > > > > > > > > 
> > > > > > > > > > > >   mmio:
> > > > > > > > > > > >   	trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa,
> > > > > > > > > > > >   	*(u64 *)val);
> > > > > > > > > > > > 
> > > > > > > > > > > > +	r = vcpu_mmio_write(vcpu, gpa, bytes, val);
> > > > > > > > > > > > 
> > > > > > > > > > > >   	/*
> > > > > > > > > > > >   	
> > > > > > > > > > > >   	 * Is this MMIO handled locally?
> > > > > > > > > > > >   	 */
> > > > > > > > > > > > 
> > > > > > > > > > > > -	if (!vcpu_mmio_write(vcpu, gpa, bytes, val))
> > > > > > > > > > > > +	if (!r)
> > > > > > > > > > > > 
> > > > > > > > > > > >   		return X86EMUL_CONTINUE;
> > > > > > > > > > > >   	
> > > > > > > > > > > >   	vcpu->mmio_needed = 1;
> > > > > > > > > > > > 
> > > > > > > > > > > > -	vcpu->run->exit_reason = KVM_EXIT_MMIO;
> > > > > > > > > > > > +	vcpu->run->exit_reason = (r == -ENOTSYNC) ?
> > > > > > > > > > > > +		KVM_EXIT_MSIX_ROUTING_UPDATE : KVM_EXIT_MMIO;
> > > > > > > > > > > 
> > > > > > > > > > > This isn't very pretty, exit_reason should be written
> > > > > > > > > > > in vcpu_mmio_write().  I guess we can refactor it
> > > > > > > > > > > later.
> > > > > > > > > > 
> > > > > > > > > > Sure.
> > > > > > > > > > 
> > > > > > > > > > > > +#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV	    (1<<  0)
> > > > > > > > > > > > +
> > > > > > > > > > > > +#define KVM_MSIX_MMIO_TYPE_BASE_TABLE	    (1<<  8)
> > > > > > > > > > > > +#define KVM_MSIX_MMIO_TYPE_BASE_PBA	    (1<<  9)
> > > > > > > > > > > > +
> > > > > > > > > > > > +#define KVM_MSIX_MMIO_TYPE_DEV_MASK	    0x00ff
> > > > > > > > > > > > +#define KVM_MSIX_MMIO_TYPE_BASE_MASK	    0xff00
> > > > > > > > > > > 
> > > > > > > > > > > Any explanation of these?
> > > > > > > > > > 
> > > > > > > > > > I chose to use assigned device id instead of one specific
> > > > > > > > > > table id, because every device should got at most one MSI
> > > > > > > > > > MMIO(the same should applied to vfio device as well), and
> > > > > > > > > > if we use specific table ID, we need way to associate
> > > > > > > > > > with the device anyway, to perform mask/unmask or other
> > > > > > > > > > operation. So I think it's better to use device ID here
> > > > > > > > > > directly.
> > > > > > > > > 
> > > > > > > > > Table id will be needed to make things work for emulated
> > > > > > > > > devices.
> > > > > > > > 
> > > > > > > > I suppose even emulated device should got some kind of
> > > > > > > > id(BDF)?
> > > > > > > 
> > > > > > > Not that I know. Look at how irqfd is defined for example,
> > > > > > > or how interrupts are sent through a gsi.
> > > > > > > I would like to make the interface be able to support that.
> > > > > > > 
> > > > > > > > I think that is
> > > > > > > > enough for identification, which is already there, so we
> > > > > > > > don't need to allocate another ID for the device - because
> > > > > > > > one device would got at most one MSI-X MMIO, then use BDF or
> > > > > > > > other device specific ID should be quite straightforward.
> > > > > > > 
> > > > > > > So you propose allocating ids for emulated devices?
> > > > > > > OK. How will we map e.g. irqfds to these?
> > > > > > 
> > > > > > I don't understand. I've checked virtio-pci.c which is using
> > > > > > irqfd, and it's still a PCI device, and still have BDF, right?
> > > > > > 
> > > > > > Also, what we want is a way to identify the MSI-X MMIO. For
> > > > > > assigned device, we use BDF, then we can easily identify the
> > > > > > MMIO as well as the device. For others, even they don't have
> > > > > > BDF(I don't think so, because MSI-X is a part of PCI, and every
> > > > > > PCI device has BDF), what you need is an ID, no matter what it
> > > > > > is and how it defined. QEmu can get the allocation done, and the
> > > > > > type field in this API can still tell which kind of ID/devices
> > > > > > they are, then determine how to deal with them.
> > > > > 
> > > > > Yes, the PCI device can be identified with e.g. BFD
> > > > > (won't work for multi-domain but then we can write an allocator
> > > > > maybe). But how will we inject these interrupts?
> > > > > We can do this now with GSI ioctl or map GSI to irqfd
> > > > > and inject with irqfd write.
> > > > 
> > > > I suppose it's not in the scope of this patch...
> > > 
> > > This is why I suggested mapping GSI to msix.
> > > 
> > > > But I think you can still do
> > > > this, everything is the same as before. QEmu can read from table to
> > > > get data/address pair, then program the routing table, etc.
> > > 
> > > Yes, fine, but mask is the problem :)
> > > When qemu/irqfd injects an interrupt and it's masked,
> > > guest should not be interrupted.
> > 
> > I think this should be done by other APIs(to map GSI with MSI-X entry).
> > And of course you can introduce one GSI type which require one ID and
> > one entry number to eject later, but it's not in the scope of this
> > patch.
> 
> What you propose is adding an API to map GSI to a table ID + entry
> number?  Like this?
> 
> 1. map table ID to address+length
> 2. map GSI to table ID + entry #
> 
> But if we have such an API we don't need anything else
> as there's already an API to map assigned device interrupts
> to GSIs?

Assigned device's API is still there, and it's also not touched by this patch. You 
can modify it for other devices of course.

For assigned devices, this API is enough. And it can also support other devices, 
emulate their MSI-X MMIO. I think that's enough for this patch.

--
regards
Yang, Sheng

> 
> > --
> > regards
> > Yang, Sheng
> > 
> > > > --
> > > > regards
> > > > Yang, Sheng
> > > > 
> > > > > > > > > My idea was this: we have the device id in
> > > > > > > > > kvm_assigned_msix_entry already. Just put table id and
> > > > > > > > > entry number in
> > > > > > > > > kvm_irq_routing_entry (create a new gsi type for this).
> > > > > > > > > The result will also work for irqfd because these are
> > > > > > > > > mapped to gsi.
> > > > > > > > > 
> > > > > > > > > > And for the table and pba address, it's due to the
> > > > > > > > > > mapping in userspace may know the guest MSI-X table
> > > > > > > > > > address and PBA address at different time(due to
> > > > > > > > > > different BAR, refer to the code in
> > > > > > > > > > assigned_dev_iomem_map() of qemu). So I purposed this
> > > > > > > > > > API to allow each of them can be passed to kernel space
> > > > > > > > > > individually.
> > > > > > > > > > 
> > > > > > > > > > > > +struct kvm_msix_mmio_user {
> > > > > > > > > > > > +	__u32 dev_id;
> > > > > > > > > > > > +	__u16 type;
> > > > > > > > > > > > +	__u16 max_entries_nr;
> > > > > > > > > > > > +	__u64 base_addr;
> > > > > > > > > > > > +	__u64 base_va;
> > > > > > > > > > > > +	__u64 flags;
> > > > > > > > > > > > +	__u64 reserved[4];
> > > > > > > > > > > > +};
> > > > > > > > > > > > +
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > +int kvm_assigned_device_update_msix_mask_bit(struct
> > > > > > > > > > > > kvm *kvm, +				int assigned_dev_id, int 
entry, u32
> > 
> > flag)
> > 
> > > > > > > > > > > > +{
> > > > > > > > > > > 
> > > > > > > > > > > Need a better name for 'flag' (and make it a bool).
> > > > > > > > > > > 
> > > > > > > > > > > > +	int r = -EFAULT;
> > > > > > > > > > > > +	struct kvm_assigned_dev_kernel *adev;
> > > > > > > > > > > > +	int i;
> > > > > > > > > > > > +
> > > > > > > > > > > > +	if (!irqchip_in_kernel(kvm))
> > > > > > > > > > > > +		return r;
> > > > > > > > > > > > +
> > > > > > > > > > > > +	mutex_lock(&kvm->lock);
> > > > > > > > > > > > +	adev =
> > > > > > > > > > > > kvm_find_assigned_dev(&kvm->arch.assigned_dev_head,
> > > > > > > > > > > > +				      assigned_dev_id);
> > > > > > > > > > > > +	if (!adev)
> > > > > > > > > > > > +		goto out;
> > > > > > > > > > > > +
> > > > > > > > > > > > +	for (i = 0; i<  adev->entries_nr; i++)
> > > > > > > > > > > > +		if (adev->host_msix_entries[i].entry == entry) {
> > > > > > > > > > > > +			if (flag)
> > > > > > > > > > > > +				disable_irq_nosync(
> > > > > > > > > > > > +					adev->host_msix_entries[i].vector);
> > > > > > > > > > > > +			else
> > > > > > > > > > > > +				enable_irq(adev-
> > >
> > >host_msix_entries[i].vector);
> > >
> > > > > > > > > > > > +			r = 0;
> > > > > > > > > > > > +			break;
> > > > > > > > > > > > +		}
> > > > > > > > > > > > +out:
> > > > > > > > > > > > +	mutex_unlock(&kvm->lock);
> > > > > > > > > > > > +	return r;
> > > > > > > > > > > > +}
> > > > > > > > > > > > 
> > > > > > > > > > > > @@ -1988,6 +2008,12 @@ static int
> > > > > > > > > > > > kvm_dev_ioctl_create_vm(void)
> > > > > > > > > > > > 
> > > > > > > > > > > >   		return r;
> > > > > > > > > > > >   	
> > > > > > > > > > > >   	}
> > > > > > > > > > > >   
> > > > > > > > > > > >   #endif
> > > > > > > > > > > > 
> > > > > > > > > > > > +	r = kvm_register_msix_mmio_dev(kvm);
> > > > > > > > > > > > +	if (r<  0) {
> > > > > > > > > > > > +		kvm_put_kvm(kvm);
> > > > > > > > > > > > +		return r;
> > > > > > > > > > > > +	}
> > > > > > > > > > > 
> > > > > > > > > > > Shouldn't this be part of individual
> > > > > > > > > > > KVM_REGISTER_MSIX_MMIO calls?
> > > > > > > > > > 
> > > > > > > > > > In fact this MMIO device is more like global one for the
> > > > > > > > > > VM, not for every devices. It should handle all MMIO
> > > > > > > > > > from all MSI-X enabled devices, so I put it in the VM
> > > > > > > > > > init/destroy process.
> > > > > > > > > > 
> > > > > > > > > > > > +static int msix_table_mmio_read(struct kvm_io_device
> > > > > > > > > > > > *this, gpa_t addr, int len, +				void 
*val)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +	struct kvm_msix_mmio_dev *mmio_dev =
> > > > > > > > > > > > +		container_of(this, struct kvm_msix_mmio_dev,
> > > > > > > > > > > > table_dev); +	struct kvm_msix_mmio *mmio;
> > > > > > > > > > > > +	int idx, ret = 0, entry, offset, r;
> > > > > > > > > > > > +
> > > > > > > > > > > > +	mutex_lock(&mmio_dev->lock);
> > > > > > > > > > > > +	idx = get_mmio_table_index(mmio_dev, addr, len);
> > > > > > > > > > > > +	if (idx<  0) {
> > > > > > > > > > > > +		ret = -EOPNOTSUPP;
> > > > > > > > > > > > +		goto out;
> > > > > > > > > > > > +	}
> > > > > > > > > > > > +	if ((addr&  0x3) || (len != 4&&  len != 8))
> > > > > > > > > > > > +		goto out;
> > > > > > > > > > > 
> > > > > > > > > > > What about (addr & 4) && (len == 8)? Is it supported?
> > > > > > > > > > > It may cross entry boundaries.
> > > > > > > > > > 
> > > > > > > > > > Should not supported. But I haven't found words on the
> > > > > > > > > > PCI spec for it. So I didn't add this check.
> > > > > > > > > 
> > > > > > > > > IMPLEMENTATION NOTE
> > > > > > > > > MSI-X Memory Space Structures in Read/Write Memory
> > > > > > > > > 
> > > > > > > > > ....
> > > > > > > > > 
> > > > > > > > > For all accesses to MSI-X Table and MSI-X PBA fields,
> > > > > > > > > software must use aligned full
> > > > > > > > > DWORD or aligned full QWORD transactions; otherwise, the
> > > > > > > > > result is undefined.
> > > > > > > > 
> > > > > > > > Yes, this one is enough, I would add the checking.
> > > > > > > > 
> > > > > > > > > > > > +	mmio =&mmio_dev->mmio[idx];
> > > > > > > > > > > > +
> > > > > > > > > > > > +	entry = (addr - mmio->table_base_addr) /
> > > > > > > > > > > > PCI_MSIX_ENTRY_SIZE; +	offset = addr&  0xf;
> > > > > > > > > > > > +	r = copy_from_user(val, (void
> > > > > > > > > > > > *)(mmio->table_base_va + +			entry *
> > > > > > > > > > > > PCI_MSIX_ENTRY_SIZE + offset), len);
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > +	if (r)
> > > > > > > > > > > > +		goto out;
> > > > > > > > > > > > +out:
> > > > > > > > > > > > +	mutex_unlock(&mmio_dev->lock);
> > > > > > > > > > > > +	return ret;
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > > +static int msix_table_mmio_write(struct
> > > > > > > > > > > > kvm_io_device *this, gpa_t addr, +				int 
len, const
> > > > > > > > > > > > void *val) +{
> > > > > > > > > > > > +	struct kvm_msix_mmio_dev *mmio_dev =
> > > > > > > > > > > > +		container_of(this, struct kvm_msix_mmio_dev,
> > > > > > > > > > > > table_dev); +	struct kvm_msix_mmio *mmio;
> > > > > > > > > > > > +	int idx, entry, offset, ret = 0, r = 0;
> > > > > > > > > > > > +	gpa_t entry_base;
> > > > > > > > > > > > +	u32 old_ctrl, new_ctrl;
> > > > > > > > > > > > +
> > > > > > > > > > > > +	mutex_lock(&mmio_dev->lock);
> > > > > > > > > > > > +	idx = get_mmio_table_index(mmio_dev, addr, len);
> > > > > > > > > > > > +	if (idx<  0) {
> > > > > > > > > > > > +		ret = -EOPNOTSUPP;
> > > > > > > > > > > > +		goto out;
> > > > > > > > > > > > +	}
> > > > > > > > > > > > +	if ((addr&  0x3) || (len != 4&&  len != 8))
> > > > > > > > > > > > +		goto out;
> > > > > > > > > > > > +	mmio =&mmio_dev->mmio[idx];
> > > > > > > > > > > > +	entry = (addr - mmio->table_base_addr) /
> > > > > > > > > > > > PCI_MSIX_ENTRY_SIZE; +	entry_base =
> > > > > > > > > > > > mmio->table_base_va + entry * PCI_MSIX_ENTRY_SIZE;
> > > > > > > > > > > > +	offset = addr&  0xF; +
> > > > > > > > > > > > +	if (copy_from_user(&old_ctrl,
> > > > > > > > > > > > +			entry_base + PCI_MSIX_ENTRY_VECTOR_CTRL,
> > > > > > > > > > > > +			sizeof old_ctrl))
> > > > > > > > > > > > +		goto out;
> > > > > > > > > > > 
> > > > > > > > > > > get_user() is easier.
> > > > > > > > > > > 
> > > > > > > > > > > > +
> > > > > > > > > > > > +	/* No allow writing to other fields when entry is
> > > > > > > > > > > > unmasked */ +	if (!(old_ctrl&
> > > > > > > > > > > > PCI_MSIX_ENTRY_CTRL_MASKBIT)&& +	    offset !=
> > > > > > > > > > > > PCI_MSIX_ENTRY_VECTOR_CTRL)
> > > > > > > > > > > > +		goto out;
> > > > > > > > > > > > +
> > > > > > > > > > > > +	if (copy_to_user(entry_base + offset, val, len))
> > > > > > > > > > > > +		goto out;
> > > > > > > > > > > > 
> > > > > > > > > > > > +
> > > > > > > > > > > > +	if (copy_from_user(&new_ctrl,
> > > > > > > > > > > > +			entry_base + PCI_MSIX_ENTRY_VECTOR_CTRL,
> > > > > > > > > > > > +			sizeof new_ctrl))
> > > > > > > > > > > > +		goto out;
> > > > > > > > > > > 
> > > > > > > > > > > put_user()
> > > > > > > > > > > 
> > > > > > > > > > > > +
> > > > > > > > > > > > +	if ((offset<  PCI_MSIX_ENTRY_VECTOR_CTRL&&  len ==
> > > > > > > > > > > > 4)
> > > > > > > > > > > > 
> > > > > > > > > > > > || +	    (offset<  PCI_MSIX_ENTRY_DATA&&  len == 8))
> > > > > > > > > > > > 
> > > > > > > > > > > > +		ret = -ENOTSYNC;
> > > > > > > > > > > > +	if (old_ctrl == new_ctrl)
> > > > > > > > > > > > +		goto out;
> > > > > > > > > > > > +	if (!(old_ctrl&  PCI_MSIX_ENTRY_CTRL_MASKBIT)&&
> > > > > > > > > > > > +			(new_ctrl&  PCI_MSIX_ENTRY_CTRL_MASKBIT))
> > > > > > > > > > > > +		r = update_msix_mask_bit(mmio_dev->kvm, mmio,
> > 
> > entry,
> > 
> > > > > > > > > > > > 1); +	else if ((old_ctrl&
> > > > > > > > > > > > PCI_MSIX_ENTRY_CTRL_MASKBIT)&& +			!(new_ctrl&
> > > > > > > > > > > > PCI_MSIX_ENTRY_CTRL_MASKBIT))
> > > > > > > > > > > > +		r = update_msix_mask_bit(mmio_dev->kvm, mmio,
> > 
> > entry,
> > 
> > > > > > > > > > > > 0); +	if (r || ret)
> > > > > > > > > > > > +		ret = -ENOTSYNC;
> > > > > > > > > > > > +out:
> > > > > > > > > > > > +	mutex_unlock(&mmio_dev->lock);
> > > > > > > > > > > > +	return ret;
> > > > > > > > > > > > +}
> > > > > > > > > > > 
> > > > > > > > > > > blank line...
> > > > > > > > > > > 
> > > > > > > > > > > > +static const struct kvm_io_device_ops
> > > > > > > > > > > > msix_mmio_table_ops = { +	.read     =
> > > > > > > > > > > > msix_table_mmio_read,
> > > > > > > > > > > > +	.write    = msix_table_mmio_write,
> > > > > > > > > > > > +};
> > > > > > > > > > > > +
> > > > > > > > > > > > ++
> > > > > > > > > > > > +int kvm_vm_ioctl_register_msix_mmio(struct kvm *kvm,
> > > > > > > > > > > > +				    struct kvm_msix_mmio_user *mmio_user)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +	struct kvm_msix_mmio_dev *mmio_dev
> > > > > > > > > > > > =&kvm->msix_mmio_dev; +	struct kvm_msix_mmio *mmio =
> > > > > > > > > > > > NULL;
> > > > > > > > > > > > +	int r = 0, i;
> > > > > > > > > > > > +
> > > > > > > > > > > > +	mutex_lock(&mmio_dev->lock);
> > > > > > > > > > > > +	for (i = 0; i<  mmio_dev->mmio_nr; i++) {
> > > > > > > > > > > > +		if (mmio_dev->mmio[i].dev_id ==
> > > > > > > > > > > > mmio_user->dev_id&& +		    (mmio_dev->mmio[i].type&
> > > > 
> > > > KVM_MSIX_MMIO_TYPE_DEV_MASK)
> > > > 
> > > > > > ==
> > > > > > 
> > > > > > > > > > > > +		    (mmio_user->type&
> > 
> > KVM_MSIX_MMIO_TYPE_DEV_MASK)) {
> > 
> > > > > > > > > > > > +			mmio =&mmio_dev->mmio[i];
> > > > > > > > > > > > +			if (mmio->max_entries_nr != mmio_user-
> > > > >
> > > > >max_entries_nr)
> > > > >
> > > > > > {
> > > > > > 
> > > > > > > > > > > > +				r = -EINVAL;
> > > > > > > > > > > > +				goto out;
> > > > > > > > > > > > +			}
> > > > > > > > > > > > +			break;
> > > > > > > > > > > > +		}
> > > > > > > > > > > > +	}
> > > > > > > > > > > > +	if (!mmio) {
> > > > > > > > > > > > +		if (mmio_dev->mmio_nr == KVM_MSIX_MMIO_MAX) {
> > > > > > > > > > > > +			r = -ENOSPC;
> > > > > > > > > > > > +			goto out;
> > > > > > > > > > > > +		}
> > > > > > > > > > > > +		mmio =&mmio_dev->mmio[mmio_dev->mmio_nr];
> > > > > > > > > > > > +		mmio_dev->mmio_nr++;
> > > > > > > > > > > > +	}
> > > > > > > > > > > > +	mmio->max_entries_nr = mmio_user->max_entries_nr;
> > > > > > > > > > > 
> > > > > > > > > > > Sanity check to avoid overflow.
> > > > > > > > > > > 
> > > > > > > > > > > > +	mmio->dev_id = mmio_user->dev_id;
> > > > > > > > > > > > +	mmio->flags = mmio_user->flags;
> > > > > > > > > > > 
> > > > > > > > > > > Check for unsupported bits (all of them at present?)
> > > > > > > > > > > 
> > > > > > > > > > > > +	if ((mmio_user->type&  KVM_MSIX_MMIO_TYPE_DEV_MASK)
> > > > > > > > > > > > == +			KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV)
> > > > > > > > > > > > +		mmio->type = KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV;
> > > > > > > > > > > > +
> > > > > > > > > > > > +	if ((mmio_user->type& 
> > > > > > > > > > > > KVM_MSIX_MMIO_TYPE_BASE_MASK) ==
> > > > > > > > > > > > +			KVM_MSIX_MMIO_TYPE_BASE_TABLE) {
> > > > > > > > > > > > +		mmio->type |= KVM_MSIX_MMIO_TYPE_BASE_TABLE;
> > > > > > > > > > > > +		mmio->table_base_addr = mmio_user->base_addr;
> > > > > > > > > > > > +		mmio->table_base_va = mmio_user->base_va;
> > > > > > > > > > > 
> > > > > > > > > > > Check for va in kernel space.
> > > > > > > > > > > 
> > > > > > > > > > > > +	} else if ((mmio_user->type&
> > > > > > > > > > > > KVM_MSIX_MMIO_TYPE_BASE_MASK) ==
> > > > > > > > > > > > +			KVM_MSIX_MMIO_TYPE_BASE_PBA) {
> > > > > > > > > > > > +		mmio->type |= KVM_MSIX_MMIO_TYPE_BASE_PBA;
> > > > > > > > > > > > +		mmio->pba_base_addr = mmio_user->base_addr;
> > > > > > > > > > > > +		mmio->pba_base_va = mmio_user->base_va;
> > > > > > > > > > > > +	}
> > > > > > > > > > > > +out:
> > > > > > > > > > > > +	mutex_unlock(&mmio_dev->lock);
> > > > > > > > > > > > +	return r;
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > > +
> > > > > > > > > > > 
> > > > > > > > > > > In all, looks reasonable.  I'd like to see
> > > > > > > > > > > documentation for it, and review from the pci people. 
> > > > > > > > > > > Alex, mst?
> > > > > > > > > 
> > > > > > > > > Some general comments:
> > > > > > > > > PBA isn't supported in this version, which is OK, but let's
> > > > > > > > > not add a capability until it is, and let's not try to
> > > > > > > > > guess what the interface will look like. I think keeping
> > > > > > > > > PBA in userspace will be hard because it needs to be
> > > > > > > > > modified from interrupt context. Removing the PBA stub
> > > > > > > > > will make the interface simpler.
> > > > > > > > 
> > > > > > > > The API only get the PBA address now which should be fine.
> > > > > > > > And we still have threaded irq and tasklet for accessing the
> > > > > > > > userspace for interrupt handler...
> > > > > > > 
> > > > > > > I don't think it's going to work: we are not
> > > > > > > in the context of the right process. Further
> > > > > > > I think we should keep the option of
> > > > > > > reading the PBA status from the device or host kernel open.
> > > > > > > And generally having an interface
> > > > > > > for functionality we don't implement is not a good idea:
> > > > > > > you don't know whether you really can support the interface you
> > > > > > > promised.
> > > > > > 
> > > > > > Well, I don't know if we want to read PBA from device directly.
> > > > > > To me it's not a good idea because the real device has nothing
> > > > > > to do with the one we show to the guest. At least direct
> > > > > > accessing the mask bits of real device would be very dangerous.
> > > > > > Avi?
> > > > > > 
> > > > > > --
> > > > > > regards
> > > > > > Yang, Sheng
> > > > > 
> > > > > I am not really suggesting this. What I say is PBA is unimplemented
> > > > > let us not commit to an interface yet.
> > > > > 
> > > > > > > > --
> > > > > > > > regards
> > > > > > > > Yang, Sheng
> > > > > > > > 
> > > > > > > > > > Would add the API document soon.
> > > > > > > > > > 
> > > > > > > > > > --
> > > > > > > > > > regards
> > > > > > > > > > Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html