On 19/07/12 01:23, Michael S. Tsirkin wrote: > On Wed, Jul 18, 2012 at 11:17:12PM +1000, Alexey Kardashevskiy wrote: >> On 18/07/12 22:43, Michael S. Tsirkin wrote: >>> On Thu, Jun 21, 2012 at 09:39:10PM +1000, Alexey Kardashevskiy wrote: >>>> Added (msi|msix)_set_message() functions. >>>> >>>> Currently msi_notify()/msix_notify() write to these vectors to >>>> signal the guest about an interrupt so the correct values have to >>>> written there by the guest or QEMU. >>>> >>>> For example, POWER guest never initializes MSI/MSIX vectors, instead >>>> it uses RTAS hypercalls. So in order to support MSIX for virtio-pci on >>>> POWER we have to initialize MSI/MSIX message from QEMU. >>>> >>>> Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx> >>> >>> So guests do enable MSI through config space, but do >>> not fill in vectors? >> >> Yes. msix_capability_init() calls arch_setup_msi_irqs() which does everything it needs to do (i.e. calls hypervisor) before msix_capability_init() writes PCI_MSIX_FLAGS_ENABLE to the PCI_MSIX_FLAGS register. >> >> These vectors are the PCI bus addresses, the way they are set is specific for a PCI host controller, I do not see why the current scheme is a bug. > > I won't work with any real PCI device, will it? Real pci devices expect > vectors to be written into their memory. Yes. And the hypervisor does this. On POWER (at least book3s - server powerpc, the whole config space kitchen is hidden behind RTAS (kind of bios). For the guest, this RTAS is implemented in hypervisor, for the host - in the system firmware. So powerpc linux does not have to have PHB drivers. Kinda cool. Usual powerpc server is running without the host linux at all, it is running a hypervisor called pHyp. And every guest knows that it is a guest, there is no full machine emulation, it is para-virtualization. In power-kvm, we replace that pHyp with the host linux and now QEMU plays a hypervisor role. Some day We will move the hypervisor to the host kernel completely (?) but now it is in QEMU. >>> Very strange. Are you sure it's not >>> just a guest bug? How does it work for other PCI devices? >> >> Did not get the question. It works the same for every PCI device under POWER guest. > > I mean for real PCI devices. > >>> Can't we just fix guest drivers to program the vectors properly? >>> >>> Also pls address the comment below. >> >> Comment below. >> >>> Thanks! >>> >>>> --- >>>> hw/msi.c | 13 +++++++++++++ >>>> hw/msi.h | 1 + >>>> hw/msix.c | 9 +++++++++ >>>> hw/msix.h | 2 ++ >>>> 4 files changed, 25 insertions(+) >>>> >>>> diff --git a/hw/msi.c b/hw/msi.c >>>> index 5233204..cc6102f 100644 >>>> --- a/hw/msi.c >>>> +++ b/hw/msi.c >>>> @@ -105,6 +105,19 @@ static inline uint8_t msi_pending_off(const PCIDevice* dev, bool msi64bit) >>>> return dev->msi_cap + (msi64bit ? PCI_MSI_PENDING_64 : PCI_MSI_PENDING_32); >>>> } >>>> >>>> +void msi_set_message(PCIDevice *dev, MSIMessage msg) >>>> +{ >>>> + uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev)); >>>> + bool msi64bit = flags & PCI_MSI_FLAGS_64BIT; >>>> + >>>> + if (msi64bit) { >>>> + pci_set_quad(dev->config + msi_address_lo_off(dev), msg.address); >>>> + } else { >>>> + pci_set_long(dev->config + msi_address_lo_off(dev), msg.address); >>>> + } >>>> + pci_set_word(dev->config + msi_data_off(dev, msi64bit), msg.data); >>>> +} >>>> + >>> >>> Please add documentation. Something like >>> >>> /* >>> * Special API for POWER to configure the vectors through >>> * a side channel. Should never be used by devices. >>> */ >> >> >> It is useful for any para-virtualized environment I believe, is not it? >> For s390 as well. Of course, if it supports PCI, for example, what I am not sure it does though :) > > I expect the normal guest to program the address into MSI register using > config accesses, same way that it enables MSI/MSIX. > Why POWER does it differently I did not yet figure out but I hope > this weirdness is not so widespread. In para-virt I would expect the guest not to touch config space at all. At least it should use one interface rather than two but this is how it is. >>>> bool msi_enabled(const PCIDevice *dev) >>>> { >>>> return msi_present(dev) && >>>> diff --git a/hw/msi.h b/hw/msi.h >>>> index 75747ab..6ec1f99 100644 >>>> --- a/hw/msi.h >>>> +++ b/hw/msi.h >>>> @@ -31,6 +31,7 @@ struct MSIMessage { >>>> >>>> extern bool msi_supported; >>>> >>>> +void msi_set_message(PCIDevice *dev, MSIMessage msg); >>>> bool msi_enabled(const PCIDevice *dev); >>>> int msi_init(struct PCIDevice *dev, uint8_t offset, >>>> unsigned int nr_vectors, bool msi64bit, bool msi_per_vector_mask); >>>> diff --git a/hw/msix.c b/hw/msix.c >>>> index ded3c55..5f7d6d3 100644 >>>> --- a/hw/msix.c >>>> +++ b/hw/msix.c >>>> @@ -45,6 +45,15 @@ static MSIMessage msix_get_message(PCIDevice *dev, unsigned vector) >>>> return msg; >>>> } >>>> >>>> +void msix_set_message(PCIDevice *dev, int vector, struct MSIMessage msg) >>>> +{ >>>> + uint8_t *table_entry = dev->msix_table_page + vector * PCI_MSIX_ENTRY_SIZE; >>>> + >>>> + pci_set_quad(table_entry + PCI_MSIX_ENTRY_LOWER_ADDR, msg.address); >>>> + pci_set_long(table_entry + PCI_MSIX_ENTRY_DATA, msg.data); >>>> + table_entry[PCI_MSIX_ENTRY_VECTOR_CTRL] &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT; >>>> +} >>>> + >>>> /* Add MSI-X capability to the config space for the device. */ >>>> /* Given a bar and its size, add MSI-X table on top of it >>>> * and fill MSI-X capability in the config space. >>>> diff --git a/hw/msix.h b/hw/msix.h >>>> index 50aee82..26a437e 100644 >>>> --- a/hw/msix.h >>>> +++ b/hw/msix.h >>>> @@ -4,6 +4,8 @@ >>>> #include "qemu-common.h" >>>> #include "pci.h" >>>> >>>> +void msix_set_message(PCIDevice *dev, int vector, MSIMessage msg); >>>> + >>>> int msix_init(PCIDevice *pdev, unsigned short nentries, >>>> MemoryRegion *bar, >>>> unsigned bar_nr, unsigned bar_size); >>>> -- >>>> 1.7.10 >>>> >>>> ps. double '-' and git version is an end-of-patch scissor as I read somewhere, cannot recall where exactly :) >>>> >>>> >>>> >>>> >>>> >>>> >>>> On 21/06/12 20:56, Jan Kiszka wrote: >>>>> On 2012-06-21 12:50, Alexey Kardashevskiy wrote: >>>>>> On 21/06/12 20:38, Jan Kiszka wrote: >>>>>>> On 2012-06-21 12:28, Alexey Kardashevskiy wrote: >>>>>>>> On 21/06/12 17:39, Jan Kiszka wrote: >>>>>>>>> On 2012-06-21 09:18, Alexey Kardashevskiy wrote: >>>>>>>>>> >>>>>>>>>> agrhhh. sha1 of the patch changed after rebasing :) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Added (msi|msix)_(set|get)_message() function for whoever might >>>>>>>>>> want to use them. >>>>>>>>>> >>>>>>>>>> Currently msi_notify()/msix_notify() write to these vectors to >>>>>>>>>> signal the guest about an interrupt so the correct values have to >>>>>>>>>> written there by the guest or QEMU. >>>>>>>>>> >>>>>>>>>> For example, POWER guest never initializes MSI/MSIX vectors, instead >>>>>>>>>> it uses RTAS hypercalls. So in order to support MSIX for virtio-pci on >>>>>>>>>> POWER we have to initialize MSI/MSIX message from QEMU. >>>>>>>>>> >>>>>>>>>> As only set* function are required by now, the "get" functions were added >>>>>>>>>> or made public for a symmetry. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx> >>>>>>>>>> --- >>>>>>>>>> hw/msi.c | 29 +++++++++++++++++++++++++++++ >>>>>>>>>> hw/msi.h | 2 ++ >>>>>>>>>> hw/msix.c | 11 ++++++++++- >>>>>>>>>> hw/msix.h | 3 +++ >>>>>>>>>> 4 files changed, 44 insertions(+), 1 deletion(-) >>>>>>>>>> >>>>>>>>>> diff --git a/hw/msi.c b/hw/msi.c >>>>>>>>>> index 5233204..9ad84a4 100644 >>>>>>>>>> --- a/hw/msi.c >>>>>>>>>> +++ b/hw/msi.c >>>>>>>>>> @@ -105,6 +105,35 @@ static inline uint8_t msi_pending_off(const PCIDevice* dev, bool msi64bit) >>>>>>>>>> return dev->msi_cap + (msi64bit ? PCI_MSI_PENDING_64 : PCI_MSI_PENDING_32); >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> +MSIMessage msi_get_message(PCIDevice *dev) >>>>>>>>> >>>>>>>>> MSIMessage msi_get_message(PCIDevice *dev, unsigned vector) >>>>>>>> >>>>>>>> >>>>>>>> Who/how/why is going to calculate the vector here? >>>>>>>> >>>>>>>>> >>>>>>>>>> +{ >>>>>>>>>> + uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev)); >>>>>>>>>> + bool msi64bit = flags & PCI_MSI_FLAGS_64BIT; >>>>>>>>>> + MSIMessage msg; >>>>>>>>>> + >>>>>>>>>> + if (msi64bit) { >>>>>>>>>> + msg.address = pci_get_quad(dev->config + msi_address_lo_off(dev)); >>>>>>>>>> + } else { >>>>>>>>>> + msg.address = pci_get_long(dev->config + msi_address_lo_off(dev)); >>>>>>>>>> + } >>>>>>>>>> + msg.data = pci_get_word(dev->config + msi_data_off(dev, msi64bit)); >>>>>>>>> >>>>>>>>> And I have this here in addition: >>>>>>>>> >>>>>>>>> unsigned int nr_vectors = msi_nr_vectors(flags); >>>>>>>>> ... >>>>>>>>> >>>>>>>>> if (nr_vectors > 1) { >>>>>>>>> msg.data &= ~(nr_vectors - 1); >>>>>>>>> msg.data |= vector; >>>>>>>>> } >>>>>>>>> >>>>>>>>> See PCI spec and existing code. >>>>>>>> >>>>>>>> >>>>>>>> What for? I really do not get it why someone might want to read something but not real value. >>>>>>>> What PCI code should I look? >>>>>>> >>>>>>> I'm not sure what your use case for reading the message is. For KVM >>>>>>> device assignment it is preparing an alternative message delivery path >>>>>>> for MSI vectors. And for this we will need vector notifier support for >>>>>>> MSI as well. You can check the MSI-X code for corresponding use cases of >>>>>>> msix_get_message. >>>>>> >>>>>>> And when we already have msi_get_message, another logical use case is >>>>>>> msi_notify. See msix.c again. >>>>>> >>>>>> Aaaa. >>>>>> >>>>>> I have no case for reading the message. All I need is writing. And I want it public as I want to use >>>>>> it from hw/spapr_pci.c. You suggested to add reading, I added "get" to be _symmetric_ to "set" >>>>>> ("get" returns what "set" wrote). You want a different thing which I can do but it is not >>>>>> msi_get_message(), it is something like msi_prepare_message(MSImessage msg) or >>>>>> msi_set_vector(uint16_t data) or simply internal kitchen of msi_notify(). >>>>>> >>>>>> Still can do what you suggested, it just does not seem right. >>>>> >>>>> It is right - when looking at it from a different angle. ;) >>>>> >>>>> I don't mind if you add msi_get_message now or leave this to me. Likely >>>>> the latter is better as you have no use case for msi_get_message (and >>>>> also msix_get_message!) outside of their modules, thus we should not >>>>> export those functions anyway. -- Alexey -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html