On 12/21/2023 7:01 PM, Lukas Wunner wrote:
On Thu, Dec 21, 2023 at 11:39:40AM +0100, Lukas Wunner wrote:
On Tue, Dec 19, 2023 at 07:51:53PM -0500, Ethan Zhao wrote:
For those endpoint devices connect to system via hotplug capable ports,
users could request a warm reset to the device by flapping device's link
through setting the slot's link control register, as pciehpt_ist() DLLSC
interrupt sequence response, pciehp will unload the device driver and
then power it off. thus cause an IOMMU devTLB flush request for device to
be sent and a long time completion/timeout waiting in interrupt context.
I think the problem is in the "waiting in interrupt context".
I'm wondering whether Intel IOMMUs possibly have a (perhaps undocumented)
capability to reduce the Invalidate Completion Timeout to a sane value?
Could you check whether that's supported?
Granted, the Implementation Note you've pointed to allows 1 sec + 50%,
but that's not even a "must", it's a "should". So devices are free to
take even longer. We have to cut off at *some* point.
I really "expected" there is interrrupt signal to iommu hardware when
the PCIe swtich downstream device 'gone', or some internal polling
/heartbeating the endpoint device for ATS breaking,
but so far seems there are only hotplug interrupts to downstream
control.
...
How to define the point "some" msec to timeout while software
break out the waiting loop ? or polling if the target is gone ?
Thanks,
Ethan
Thanks,
Lukas