On 12/14/2023 10:16 AM, Ethan Zhao wrote:
On 12/13/2023 6:44 PM, Lukas Wunner wrote:
On Tue, Dec 12, 2023 at 10:46:37PM -0500, Ethan Zhao wrote:
For those endpoint devices connect to system via hotplug capable ports,
users could request a warm reset to the device by flapping device's
link
through setting the slot's link control register,
Well, users could just *unplug* the device, right? Why is it relevant
that thay could fiddle with registers in config space?
Yes, if the device and it's slot are hotplug capable, users could just
'unplug' the device.
But this case reported, users try to do a warm reset with a tool
command like:
mlxfwreset -d <busid> -y reset
Actually, it will access configuration space just as
setpci -s 0000:17:01.0 0x78.L=0x21050010
Well, we couldn't say don't fiddle PCIe config space registers like
that.
as pciehpt_ist() DLLSC
interrupt sequence response, pciehp will unload the device driver and
then power it off. thus cause an IOMMU devTLB flush request for
device to
be sent and a long time completion/timeout waiting in interrupt
context.
A completion timeout should be on the order of usecs or msecs, why
does it
cause a hard lockup? The dmesg excerpt you've provided shows a 12
*second*
delay between hot removal and watchdog reaction.
In my understanding, the devTLB flush request sent to ATS capable devcie
is non-posted request, if the ATS transaction is broken by endpoint link
-down, power-off event, the timeout will take up to 60 seconds+-30,
see "Invalidate Completion Timeout " part of
chapter 10.3.1 Invalidate Request
In PCIe spec 6.1
"
IMPLEMENTATION NOTE:
INVALIDATE COMPLETION TIMEOUT
Devices should respond to Invalidate Requests within 1 minute (+50%
-0%).Having a bounded time
permits an ATPT to implement Invalidate Completion Timeouts and reuse
the associated ITag values.
ATPT designs are implementation specific. As such, Invalidate
Completion Timeouts and their
associated error handling are outside the scope of this specification
"
Fix it by checking the device's error_state in
devtlb_invalidation_with_pasid() to avoid sending meaningless devTLB
flush
request to link down device that is set to
pci_channel_io_perm_failure and
then powered off in
This doesn't seem to be a proper fix. It will work most of the time
but not always. A user might bring down the slot via sysfs, then yank
the card from the slot just when the iommu flush occurs such that the
pci_dev_is_disconnected(pdev) check returns false but the card is
physically gone immediately afterwards. In other words, you've shrunk
the time window during which the issue may occur, but haven't eliminated
it completely.
If you mean disable the slot via sysfs, that's SAFE_REMOVAL, right ?
that would issse devTLB invalidation first, power off device later, it
wouldn't trigger the hard lockup, though the
pci_dev_is_disconnected() return false. this fix works such case.
Could you help to point out if there are any other window to close ?
Thanks,
Ethan
Thanks,
Ethan
Thanks,
Lukas