Thank you for all your comments! I really appreciate all your help with this. I will address the style feedback once we reach a decision on how we will fix this bug. I first will respond to your comments, and then I will list out the possible solutions to this bug, in a way that takes into account all of your insights. On Tue, Dec 26, 2023 at 7:15 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > Can you include a citation (spec name, revision, section) for this > DMAR requirement? > This was my mistake–I misinterpreted what a firmware developer told me. This is a firmware ACPI requirement from windows, which is not in the DMAR spec. Windows uses it to identify externally exposed PCIE root ports. https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports > But I don't see where the defect is here. And I doubt that this is > really a unique situation. So it's likely that this will happen on > other systems, and we don't want to have to add quirks every time > another one shows up. ... > don't have the new interface. But we at least need a plan that > doesn't require quirks indefinitely. ... On Thu, Dec 28, 2023 at 8:41 AM Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx> wrote: > This is not scalable at all. You would need to include lots of systems > here. And there should be no issue at all anyways. My team tests hundreds of different devices, and this is the only one which exhibited this issue that we’ve seen so far. No other devices we’ve seen so far have a discrete internal Thunderbolt controller which is treated as a removable device. Therefore, we don’t expect that a large number of devices will need this quirk. > There is really nothing "unique" here. It's exactly as specified by > this: > > https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports > > and being used in many many system already out there and those have been > working just fine. I don’t know how many computers have a discrete Thunderbolt chip that is separate from their CPU, but this doesn’t seem to be a common occurrence. These devices were made during a narrow window of time when CPUs didn’t have Thunderbolt features yet, so a separate JHL6540 chip had to be added so that Lenovo could include Thunderbolt on X1 Carbon Gen 7/8. As you said, these devices do indeed work fine in cases where you don’t care if a PCI Thunderbolt device is internal or external, which is most cases. Problems happen only whenever someone adds a security policy, or some other feature that cares about the distinction between a fixed or removable PCI device. > This has been working just fine so far and as far as I can tell there is > no such "policy" in place in the mainline kernel. Correct, there is no such policy in the mainline kernel as of now. The bug is that the linux kernel’s “removable” property is inaccurate for this device. > Can you elaborate what the issue is and which mainline kernel you are > using to reproduce this? Thanks for this question! On a Lenovo Thinkpad Gen 7/Gen 8 computer with the linux kernel installed, when you look at the properties of the JHL6540 Thunderbolt controller, you see that it is incorrectly labeled as removable. I have replicated this bug on the b85ea95d0864 Linux 6.7-rc1 kernel. Before my patch, you see that the JHL6540 controller is inaccurately labeled “removable”: $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e {removable} -e {device} -e {vendor} -e looking looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0': ATTR{device}=="0x15d3" ATTR{removable}=="removable" ATTR{vendor}=="0x8086" looking at parent device '/devices/pci0000:00/0000:00:1d.4': ATTRS{device}=="0x02b4" ATTRS{vendor}=="0x8086" looking at parent device '/devices/pci0000:00': After applying the patch in this ticket, we see the JHL6540 controller is now labeled as “fixed”: $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e {removable} -e {device} -e {vendor} -e looking looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0': ATTR{device}=="0x15d3" ATTR{removable}=="fixed" ATTR{vendor}=="0x8086" looking at parent device '/devices/pci0000:00/0000:00:1d.4': ATTRS{device}=="0x02b4" ATTRS{vendor}=="0x8086" looking at parent device '/devices/pci0000:00': OK so here is the part where I share what I’ve developed as a result of your comments: The two options I see to resolve this are as follows: 1) Either we fix this by adding a new firmware interface as Bjorn Helgaas brought up. 2) Alternatively we may address this through a cleaned-up version of this patch If the solution is to add a firmware interface, how would I go about that process? Could you put me in touch with someone with that know-how? Would we have a temporary software quirk in place while the firmware spec is being updated? I am deferring to your expertise and knowledge in solving this bug. Thank you for all your help.