On Tue, Nov 23, 2010 at 02:47:33PM +0200, Avi Kivity wrote: > On 11/23/2010 10:30 AM, Yang, Sheng wrote: > >On Tuesday 23 November 2010 15:54:40 Avi Kivity wrote: > >> On 11/23/2010 08:35 AM, Yang, Sheng wrote: > >> > On Tuesday 23 November 2010 14:17:28 Avi Kivity wrote: > >> > > On 11/23/2010 08:09 AM, Yang, Sheng wrote: > >> > > > Hi Avi, > >> > > > > >> > > > I've purposed the following API for mask bit support. > >> > > > > >> > > > The main point is, QEmu can know which entries are enabled(by > >> > > > pci_enable_msix()). And for enabled entries, kernel own it, > >> > > > including MSI data/address and mask bit(routing table and mask > >> > > > bitmap). QEmu should use KVM_GET_MSIX_ENTRY ioctl to get them(and > >> > > > it can sync with them if it want to do so). > >> > > > > >> > > > Before entries are enabled, QEmu can still use it's own MSI > >> > > > table(because we didn't contain these kind of information in > >> > > > kernel, and it's unnecessary for kernel). > >> > > > > >> > > > The KVM_MSIX_FLAG_ENTRY flag would be clear if QEmu want to query > >> > > > one entry didn't exist in kernel - or we can simply return -EINVAL > >> > > > for it. > >> > > > > >> > > > I suppose it would be rare for QEmu to use this interface to get > >> > > > the context of entry(the only case I think is when MSI-X disable > >> > > > and QEmu need to sync the context), so performance should not be > >> > > > an issue. > >> > > > > >> > > > What's your opinion? > >> > > > > >> > > > > #define KVM_GET_MSIX_ENTRY _IOWR(KVMIO, 0x7d, struct > >> > > > > kvm_msix_entry) > >> > > > >> > > Need SET_MSIX_ENTRY for live migration as well. > >> > > >> > Current we don't support LM with VT-d... > >> > >> Isn't this work useful for virtio as well? > > > >Yeah, but won't be included in this patchset. > > What API changes are needed? I'd like to see the complete API. > > >> > > What about the pending bits? > >> > > >> > We didn't cover it here - and it's in another MMIO space(PBA). Of course > >> > we can add more flags for it later. > >> > >> When an entry is masked, we need to set the pending bit for it > >> somewhere. I guess this is broken in the existing code (without your > >> patches)? > > > >Even with my patch, we didn't support the pending bit. It would always return 0 > >now. What we supposed to do(after my patch checked in) is to check IRQ_PENDING flag > >of irq_desc->status(if the entry is masked), and return the result to userspace. > > > >That would involve some core change, like to export irq_to_desc(). I don't think > >it would be accepted soon, so would push mask bit first. > > The API needs to be compatible with the pending bit, even if we > don't implement it now. I want to reduce the rate of API changes. > > >> > >> > > Also need a new exit reason to tell userspace that an msix entry has > >> > > changed, so userspace can update mappings. > >> > > >> > I think we don't need it. Whenever userspace want to get one mapping > >> > which is an enabled MSI-X entry, it can check it with the API > >> > above(which is quite rare, because kernel would handle all of them when > >> > guest is accessing them). If it's a disabled entry, the context inside > >> > userspace MMIO record is the correct one(and only one). The only place I > >> > think QEmu need to sync is when MSI-X is about to disabled, QEmu need to > >> > update it's own MMIO record. > >> > >> So in-kernel handling of mmio would be decided per entry? I'm trying to > >> simplify this, and simplest thing is - all or nothing. > > > >So you would like to handle all MSI-X MMIO in kernel? > > Yes. Writes to address or data would be handled by: > - recording it into the shadow msix table > - notifying userspace that msix entry x changed > Reads would be handled in kernel from the shadow msix table. > > So instead of > > - guest reads/writes msix > - kvm filters mmio, implements some, passes others to userspace > > we have > > - guest reads/writes msix > - kvm implements all > - some writes generate an additional notification to userspace One small proposal in addition: since all accesses are done from guest anyway, the shadow table can/should be stored using userspace memory, reducing the kernel memory overhead of the feature from up to 4K per MSIX table to just 8 bytes. Active entries can be cached in kernel memory. > > -- > error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html