Re: Unable to pass SATA controller to VM with intel_iommu=igfx_off

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank you very much for the detailed and invaluable information!

In the meantime, it has turned out that host and VM are stable, but that
performance is a disaster. Therefore, the success is a pyrrhic victory.
I have connected two disks to the controller and copied a large file
between them from within the VM. The speed was about 3 MB/s. Of course,
this does not make any sense.

In any case, I will follow your advice and buy another adapter card,
probably with the ADM1061. But it still would be interesting (hopefully)
to figure out what is going on here. Thus, ...

On 09.01.2018 23:41, Alex Williamson wrote:
>> Remaining questions:
>>
>> - What could make the Seabios hang for such a long time upon every boot?
> 
> Perhaps some sort of problem with the device ROM.  Assuming you're not
> booting the VM from the assigned device, you can add rombar=0 to the
> qemu vfio-pci device options to disable the ROM.

I now have tried that. Sadly, rombar=0 did not change anything. Seabios
still hangs during boot for a minute or so, then the VM boots up without
problems. Seabios hangs whether or not disks are connected to the
controller.

> Setting a bootindex
> both on the vfio-pci device and the actual boot device could help.

Unfortunately, setting the bootindex on the actual boot device is not
possible since the boot device's image format is raw. Trying to set a
bootindex makes qemu emit the following error message upon start:

"[...] Block format 'raw' does not support the option 'bootindex'"

I have then set the bootindex of the vfio device to 9; that did not
change anything. Additionally, I have tried -boot strict=on; that didn't
change anything as well.

I think I can remember a message from you on another list (or maybe the
same) where you were helping a person with a similar problem. If memory
serves me, you were suggesting that the Seabios might be too old. Could
that be the case for me, too?

> I
> think the '-boot c' option is deprecated, explicitly specifying a
> emulated controller would be better.

I have re-read qemu's manual for my host system, and of course, you are
right :-) I'll try to figure out how to set the boot order in a
non-deprecated fashion (but still without using bootindex).

>  Also, using q35 vs 440fx for the VM machine
> type makes no difference, q35 is, if anything, more troublesome imo.

This is interesting. I have re-tested and confirmed my initial findings:
When I use -machine pc,... instead of -machine q35,..., qemu emits the
following error when starting:

-device
ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=2,chassis=1,id=root.1:
Bus 'pcie.0' not found

This is one of the few things I thought I had understood. According to
my research, the q35 model establishes a root PCI Express bus by default
(pcie.0), while the pc (= 440fx) model establishes only a root PCI bus
by default (pci.0).

The device which I would like to pass through is a PCI-E device.
According to https://github.com/qemu/qemu/blob/master/docs/pcie.txt (as
far as I have understood it), we should put PCI-E devices only on PCI-E
buses, but not on PCI buses.

If I would use -machine pc, there would be only a PCI (non-Express) root
bus, and although we could plug in the pass-through device there, we
shouldn't do it (or should we?). Did I get this wrong?

>> - Could you please shortly explain what the option "x-no-kvm-intx=on"
>> does and why I need it in this case?
> 
> INTx is the legacy PCI interrupt (ie. INTA, INTB, INTC, INTD).  This is
> a level triggered interrupt therefore it continues to assert until the
> device is serviced.  It must therefore be masked on the host while it
> is handled by the guest.  There are two paths we can use for injecting
> this interrupt into the VM and unmasking it on the host once the VM
> samples the interrupt.  When KVM is used for acceleration, these happen
> via direct connection between the vfio-pci and kvm modules using
> eventfds and irqfds.  The x-no-kvm-intx option disables that path,
> instead bouncing out to QEMU to do the same.

I see. Thank you very much for explaining so clearly.

> TBH, I have no idea why this would make it work.  The QEMU path is
> slower than the KVM path, but they should be functionally identical.

Eventually the device design is indeed so badly broken that the
functional identity might not be given in that case. I suppose that the
difference in speed between the two paths is not great enough to explain
the extremely slow data transfer in the VM?

>> - Could you please shortly explain what exactly it wants to tell me when
>> it says that it disables INT xx, and notable if this is a bad thing I
>> should take care of?
> 
> The "Disabling IRQ XX, nobody cared" message means that the specified
> IRQ asserted many times without any of the interrupt handlers claiming
> that it was their device asserting it.  It then masks the interrupt at
> the APIC.  With device assignment this can mean that the mechanism we
> use to mask the device doesn't work for that device.  There's a
> vfio-pci module option you can use to have vfio-pci mask the interrupt
> at the APIC rather than the device, nointxmask=1.  The trouble with
> this option is that it can only be used with exclusive interrupts, so
> if any other devices share the interrupt, starting the VM will fail.
> As a test, you can unbind conflicting devices from their drivers
> (assuming non-critical devices).

Again, thank you very much for the clear explanation. I'll investigate
and report back in a few hours.

> The troublesome point here is that regardless of x-no-kvm-intx, the
> kernel uses the same masking technique for the device, so it's unclear
> why one works and the other does not.

>> - What about the "x-no-kvm-msi" and "x-no-kvm-msix" options? Would it be
>> better to use them as well? I couldn't find any sound information about
>> what exactly they do (Note: Initially, I had all three of those
>> "x-no..." options active, which made the VM boot the first time, and
>> later out of curiosity found out that "x-no-kvm-intx" is the essential
>> one. Without this one, the VM won't boot; the other two don't seem to
>> change anything in my case).
> 
> Similar to the INTx version, they route the interrupts out through QEMU
> rather than inject them through a side channel with KVM.  They're just
> slower.  Generally these options are only used for debugging as they
> make the interrupts visible to QEMU, functionality is generally not
> affected.

Thank you very much - got it.

> What interrupt mode does the device operate in once the VM is running?
> You can run 'lspci -vs <device address>' on the host and see something
> like:
> 
> 	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
> 	Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
> 
> In this case the Enable+ shows the device is using MSI-X rather than
> MSI, which shows Enable-.  The device might not support both (or
> either).  If none are Enable+, legacy interrupts are probably being
> used.

It says:

...
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit-
...
Capabilities: [70] Express (v2) Legacy Endpoint, MSI 00
...

Nothing else containing the string "MSI" is in the output.

> Often legacy interrupts are only used at boot and then the device
> switches to MSI/X.  If that's the case for this device, x-no-kvm-intx
> doesn't really hurt you runtime.

>> - Could we expect your patch to go into upstream (perhaps after the
>> above issues / questions have been investigated)? I will try to convince
>> the Debian people to include the patch into 4.9; if they refuse, I will
>> have to compile a new kernel each time they release one, which happens
>> quite often (probably security fixes) since some time ...
> 
> I would not recommend trying to convince Debian to take a non-upstream
> patch, the process is that I need to do more research to figure out
> why this device isn't already quirked, I'm sure others have complained,
> but did similar patches make things worse for them or did they simply
> disappear. Can you confirm whether the device behaves properly for
> host use with the patch?  Issues with assigning the device could be
> considered secondary if the host behavior is obviously improved.

I can definitely confirm that the patch vastly improves behavior for the
host. As I have described in my first message, without the patch and
with intel_iommu=on, the boot process hung for a minute or so when the
kernel tried to initialize that controller, thereby obviously hitting
timeouts and spitting out error messages multiple times. The two most
relevant messages were that the SATA link speed would be reduced (saying
one time to 3.0 Gbps and the next time to 1.5 Gbps, repeating multiple
times), for both channels, and that the disk(s) could not be identified
(if a disk(s) was (were) connected). This applies to both channels, and
consequently, the respective block devices were missing after the boot
process had finished.

I have verified this behavior multiple times with the controller card
connected to different slots, with and without HDDs connected, and after
cold boots as well as after warm boots.

There were no issues when the kernel parameter intel_iommu had *not*
been given.

With your patch applied, the system boots up without any problem whether
or not the intel_iommu=on is given. I have verified this multiple times,
putting the controller in different slots. In every case, the boot
process went normally, and the disks connected to the controller had
become block devices as expected once the system had finished booting.

Likewise, I have tested the behavior with the patched kernel, but
without the intel_iommu parameter. I did not notice any problems.

All tests were done with the Debian 4.9.0 kernel with Debian patches
(version 65). When patching the kernel, I have downloaded the Debian
kernel source package, unpacked it, copied the config from the stock
kernel, applied your patch and then compiled.

During yesterday's research, I had the system running without passing
through that controller most time (because pass-through didn't work
yet); instead, I had passed through two disks (i.e. the block devices)
connected to the controller via virtio into the VM in question. I did
not notice any problem or misbehavior. This is production (VM) server,
so I surely had noticed if there had been problems :-)

In summary, despite the short testing time, we can conclude:

1) Your patch only affects people with a Marvel ...9128 SATA chipset.

2) People without intel_iommu=on do not benefit from your patch, and are
not hurt by it.

3) People with intel_iommu=on and a stock kernel will not be able to
boot cleanly if that SATA chip is in the system; the disks connected to
that chip probably won't be recognized (as in my case); if this happens
nevertheless, it probably would be dangerous to use them.

4) People with the patched kernel will be able to use that controller
without any problem, whether intel_iommu=on is given or not; at least, I
can definitely confirm that the boot problems are being solved by that
patch.

Long term stability should be further tested. Although I am personally
convinced and will use the controller in production (either for pass
through if I can make it work in terms of performance or in another
machine for the host system), I do not take the responsibility. I am
just reporting my personal experience.

> Alternatively, the 9230, or various others in that section of the
> quirk code, are already quirked, so you can decide if picking a
> different $30 card is a better option for you ;) Thanks,

Perhaps I'll even buy two different ones: One with the 9230 (but
seriously wondering why its design should be less flawed than that of
the 9128), and one with the ADM1061 (hoping there is at least one
company which did it right - getting Windows driver for that one could
be a nightmare, though).

Thank you very much again,

Binarus



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux