Re: Question: KVM: Failed to bind vfio with PCI-e / SMMU on Juno-r2

Auger Eric <eric.auger@xxxxxxxxxx> · Wed, 13 Mar 2019 11:16:04 +0100

Hi Leo,

On 3/13/19 11:01 AM, Leo Yan wrote:
> On Wed, Mar 13, 2019 at 04:00:48PM +0800, Leo Yan wrote:
> 
> [...]
> 
>> - The second question is for GICv2m.  If I understand correctly, when
>>   passthrough PCI-e device to guest OS, in the guest OS we should
>>   create below data path for PCI-e devices:
>>                                                             +--------+
>>                                                          -> | Memory |
>>     +-----------+    +------------------+    +-------+  /   +--------+
>>     | Net card  | -> | PCI-e controller | -> | IOMMU | -
>>     +-----------+    +------------------+    +-------+  \   +--------+
>>                                                          -> | MSI    |
>>                                                             | frame  |
>>                                                             +--------+
>>
>>   Since now the master is network card/PCI-e controller but not CPU,
>>   thus there have no 2 stages for memory accessing (VA->IPA->PA).  In
>>   this case, if we configure IOMMU (SMMU) for guest OS for address
>>   translation before switch from host to guest, right?  Or SMMU also
>>   have two stages memory mapping?
>>
>>   Another thing confuses me is I can see the MSI frame is mapped to
>>   GIC's physical address in host OS, thus the PCI-e device can send
>>   message correctly to msi frame.  But for guest OS, the MSI frame is
>>   mapped to one IPA memory region, and this region is use to emulate
>>   GICv2 msi frame rather than the hardware msi frame; thus will any
>>   access from PCI-e to this region will trap to hypervisor in CPU
>>   side so KVM hyperviso can help emulate (and inject) the interrupt
>>   for guest OS?
>>
>>   Essentially, I want to check what's the expected behaviour for GICv2
>>   msi frame working mode when we want to passthrough one PCI-e device
>>   to guest OS and the PCI-e device has one static msi frame for it.
> 
> From the blog [1], it has below explanation for my question for mapping
> IOVA and hardware msi address.  But I searched the flag
> VFIO_DMA_FLAG_MSI_RESERVED_IOVA which isn't found in mainline kernel;
> I might miss something for this, want to check if related patches have
> been merged in the mainline kernel?

Yes all the mechanics for passthrough/MSI on ARM is upstream. The blog
page is outdated. The kernel allocates IOVAs for MSI doorbells
arbitrarily within this region.

#define MSI_IOVA_BASE                   0x8000000
#define MSI_IOVA_LENGTH                 0x100000

and userspace is not involved anymore in passing a usable reserved IOVA
region.

Thanks

Eric
> 
> 'We reuse the VFIO DMA MAP ioctl to pass this reserved IOVA region. A
> new flag (VFIO_DMA_FLAG_MSI_RESERVED_IOVA ) is introduced to
> differentiate such reserved IOVA from RAM IOVA. Then the base/size of
> the window is passed to the IOMMU driver though a new function
> introduced in the IOMMU API. 
> 
> The IOVA allocation within the supplied reserved IOVA window is
> performed on-demand, when the MSI controller composes/writes the MSI
> message in the PCIe device. Also the IOMMU mapping between the newly
> allocated IOVA and the backdoor address page is done at that time. The
> MSI controller uses a new function introduced in the IOMMU API to
> allocate the IOVA and create an IOMMU mapping.
>  
> So there are adaptations needed at VFIO, IOMMU and MSI controller
> level. The extension of the IOMMU API still is under discussion. Also
> changes at MSI controller level need to be consolidated.'
> 
> P.s. I also tried two tools qemu/kvmtool, both cannot pass interrupt
> for network card in guest OS.
> 
> Thanks,
> Leo Yan
> 
> [1] https://www.linaro.org/blog/kvm-pciemsi-passthrough-armarm64/
> 
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm