Re: [REGRESSION] Thunderbolt Host Reset Change Causes eGPU Disconnection from 6.8.7=>6.8.8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/5/2024 07:37, Mario Limonciello wrote:


On 5/4/24 23:59, Linux regression tracking (Thorsten Leemhuis) wrote:
[CCing Mario, who asked for the two suspected commits to be backported]

On 05.05.24 03:12, Micha Albert wrote:

     I have an AMD Radeon 6600 XT GPU in a cheap Thunderbolt eGPU board.
In 6.8.7, this works as expected, and my Plymouth screen (including the
LUKS password prompt) shows on my 2 monitors connected to the GPU as
well as my main laptop screen. Upon entering the password, I'm put into
userspace as expected. However, upon upgrading to 6.8.8, I will be
greeted with the regular password prompt, but after entering my password
and waiting for it to be accepted, my eGPU will reset and not function.
I can tell that it resets since I can hear the click of my ATX power
supply turning off and on again, and the status LED of the eGPU board
goes from green to blue and back to green, all in less than a second.

    I talked to a friend, and we found out that the kernel parameter
thunderbolt.host_reset=false fixes the issue. He also thinks that
commits cc4c94 (59a54c upstream) and 11371c (ec8162 upstream) look
suspicious. I've attached the output of dmesg when the error was
occurring, since I'm still able to use my laptop normally when this
happens, just not with my eGPU and its connected displays.

Thx for the report. Could you please test if 6.9-rc6 (or a later
snapshot; or -rc7, which should be out in about ~18 hours) is affected
as well? That would be really important to know.

It would also be great if you could try reverting the two patches you
mentioned and see if they are really what's causing this. There iirc are
two more; maybe you might need to revert some or all of them in the
order they were applied.

There are two other things that I think would be good to understand this issue.

1) Is it related to trusted devices handling?

You can try to apply it both to 6.8.y or to 6.9-rc.

https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?h=iommu/fixes&id=0f91d0795741c12cee200667648669a91b568735

2) Is it because you have amdgpu in your initramfs but not thunderbolt?

If so; there's very likely an ordering issue.

[    2.325788] [drm] GPU posting now...
[   30.360701] ACPI: bus type thunderbolt registered

Can you remove amdgpu from your initramfs and wait for it to startup after you pivot rootfs?  Does this still happen?


One more thought. When you say it's "not function", is it authorized in thunderbolt sysfs?

See https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/thunderbolt.rst

Is it showing up in lspci anymore?


Ciao, Thorsten

P.s.: To be sure the issue doesn't fall through the cracks unnoticed,
I'm adding it to regzbot, the Linux kernel regression tracking bot:

#regzbot ^introduced v6.8.7..v6.8.8
#regzbot title thunderbolt: eGPU disconnected during boot






[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux