On 11/23/2010 10:30 AM, Yang, Sheng wrote:
On Tuesday 23 November 2010 15:54:40 Avi Kivity wrote: > On 11/23/2010 08:35 AM, Yang, Sheng wrote: > > On Tuesday 23 November 2010 14:17:28 Avi Kivity wrote: > > > On 11/23/2010 08:09 AM, Yang, Sheng wrote: > > > > Hi Avi, > > > > > > > > I've purposed the following API for mask bit support. > > > > > > > > The main point is, QEmu can know which entries are enabled(by > > > > pci_enable_msix()). And for enabled entries, kernel own it, > > > > including MSI data/address and mask bit(routing table and mask > > > > bitmap). QEmu should use KVM_GET_MSIX_ENTRY ioctl to get them(and > > > > it can sync with them if it want to do so). > > > > > > > > Before entries are enabled, QEmu can still use it's own MSI > > > > table(because we didn't contain these kind of information in > > > > kernel, and it's unnecessary for kernel). > > > > > > > > The KVM_MSIX_FLAG_ENTRY flag would be clear if QEmu want to query > > > > one entry didn't exist in kernel - or we can simply return -EINVAL > > > > for it. > > > > > > > > I suppose it would be rare for QEmu to use this interface to get > > > > the context of entry(the only case I think is when MSI-X disable > > > > and QEmu need to sync the context), so performance should not be > > > > an issue. > > > > > > > > What's your opinion? > > > > > > > > > #define KVM_GET_MSIX_ENTRY _IOWR(KVMIO, 0x7d, struct > > > > > kvm_msix_entry) > > > > > > Need SET_MSIX_ENTRY for live migration as well. > > > > Current we don't support LM with VT-d... > > Isn't this work useful for virtio as well? Yeah, but won't be included in this patchset.
What API changes are needed? I'd like to see the complete API.
> > > What about the pending bits? > > > > We didn't cover it here - and it's in another MMIO space(PBA). Of course > > we can add more flags for it later. > > When an entry is masked, we need to set the pending bit for it > somewhere. I guess this is broken in the existing code (without your > patches)? Even with my patch, we didn't support the pending bit. It would always return 0 now. What we supposed to do(after my patch checked in) is to check IRQ_PENDING flag of irq_desc->status(if the entry is masked), and return the result to userspace. That would involve some core change, like to export irq_to_desc(). I don't think it would be accepted soon, so would push mask bit first.
The API needs to be compatible with the pending bit, even if we don't implement it now. I want to reduce the rate of API changes.
> > > > Also need a new exit reason to tell userspace that an msix entry has > > > changed, so userspace can update mappings. > > > > I think we don't need it. Whenever userspace want to get one mapping > > which is an enabled MSI-X entry, it can check it with the API > > above(which is quite rare, because kernel would handle all of them when > > guest is accessing them). If it's a disabled entry, the context inside > > userspace MMIO record is the correct one(and only one). The only place I > > think QEmu need to sync is when MSI-X is about to disabled, QEmu need to > > update it's own MMIO record. > > So in-kernel handling of mmio would be decided per entry? I'm trying to > simplify this, and simplest thing is - all or nothing. So you would like to handle all MSI-X MMIO in kernel?
Yes. Writes to address or data would be handled by: - recording it into the shadow msix table - notifying userspace that msix entry x changed Reads would be handled in kernel from the shadow msix table. So instead of - guest reads/writes msix - kvm filters mmio, implements some, passes others to userspace we have - guest reads/writes msix - kvm implements all - some writes generate an additional notification to userspace -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html