On Fri, 6 Dec 2024 17:01:01 +0100 Thierry Reding <thierry.reding@xxxxxxxxx> wrote: > On Fri, Dec 06, 2024 at 08:42:03AM -0500, Parker Newman wrote: > > On Wed, 4 Dec 2024 19:06:30 +0100 > > Thierry Reding <thierry.reding@xxxxxxxxx> wrote: > > > > > On Wed, Dec 04, 2024 at 11:53:17AM -0500, Parker Newman wrote: > > > > On Wed, 4 Dec 2024 17:23:53 +0100 > > > > Thierry Reding <thierry.reding@xxxxxxxxx> wrote: > > > > > > > > > On Tue, Nov 19, 2024 at 02:47:29PM -0500, Parker Newman wrote: > > > > > > On Tue, 19 Nov 2024 20:18:00 +0100 > > > > > > Andrew Lunn <andrew@xxxxxxx> wrote: > > > > > > > > > > > > > > I think there is some confusion here. I will try to summarize: > > > > > > > > - Ihe iommu is supported by the Tegra SOC. > > > > > > > > - The way the mgbe driver is written the iommu DT property is REQUIRED. > > > > > > > > > > > > > > If it is required, please also include a patch to > > > > > > > nvidia,tegra234-mgbe.yaml and make iommus required. > > > > > > > > > > > > > > > > > > > I will add this when I submit a v2 of the patch. > > > > > > > > > > > > > > - "iommus" is a SOC DT property and is defined in tegra234.dtsi. > > > > > > > > - The mgbe device tree nodes in tegra234.dtsi DO have the iommus property. > > > > > > > > - There are no device tree changes required to to make this patch work. > > > > > > > > - This patch works fine with existing device trees. > > > > > > > > > > > > > > > > I will add the fallback however in case there is changes made to the iommu > > > > > > > > subsystem in the future. > > > > > > > > > > > > > > I would suggest you make iommus a required property and run the tests > > > > > > > over the existing .dts files. > > > > > > > > > > > > > > I looked at the history of tegra234.dtsi. The ethernet nodes were > > > > > > > added in: > > > > > > > > > > > > > > 610cdf3186bc604961bf04851e300deefd318038 > > > > > > > Author: Thierry Reding <treding@xxxxxxxxxx> > > > > > > > Date: Thu Jul 7 09:48:15 2022 +0200 > > > > > > > > > > > > > > arm64: tegra: Add MGBE nodes on Tegra234 > > > > > > > > > > > > > > and the iommus property is present. So the requires is safe. > > > > > > > > > > > > > > Please expand the commit message. It is clear from all the questions > > > > > > > and backwards and forwards, it does not provide enough details. > > > > > > > > > > > > > > > > > > > I will add more details when I submit V2. > > > > > > > > > > > > > I just have one open issue. The code has been like this for over 2 > > > > > > > years. Why has it only now started crashing? > > > > > > > > > > > > > > > > > > > It is rare for Nvidia Jetson users to use the mainline kernel. Nvidia > > > > > > provides a custom kernel package with many out of tree drivers including a > > > > > > driver for the mgbe controllers. > > > > > > > > > > > > Also, while the Orin AGX SOC (tegra234) has 4 instances of the mgbe controller, > > > > > > the Nvidia Orin AGX devkit only uses mgbe0. Connect Tech has carrier boards > > > > > > that use 2 or more of the mgbe controllers which is why we found the bug. > > > > > > > > > > Correct. Also, this was a really stupid thing that I overlooked. I don't > > > > > recall the exact circumstances, but I vaguely recall there had been > > > > > discussions about adding the tegra_dev_iommu_get_stream_id() helper > > > > > (that this patch uses) around the time that this driver was created. In > > > > > the midst of all of this I likely forgot to update the driver after the > > > > > discussions had settled. > > > > > > > > > > Anyway, I agree with the conclusion that we don't need a compatibility > > > > > fallback for this, both because it would be actively wrong to do it and > > > > > we've had the required IOMMU properties in device tree since the start, > > > > > so there can't be any regressions caused by this. > > > > > > > > > > I don't think it's necessary to make the iommus property required, > > > > > though, because there's technically no requirement for these devices to > > > > > be attached to an IOMMU. They usually are, and it's better if they are, > > > > > but they should be able to work correctly without an IOMMU. > > > > > > > > > Thanks for confirming from the Nvidia side! I wasn't sure if they would > > > > work without the iommu. That said, if you did NOT want to use the iommu > > > > and removed the iommu DT property then the probe will fail after my patch. > > > > Would we not need a guard around the writes to MGBE_WRAP_AXI_ASID0_CTRL as well? > > > > > > Well... frankly, I don't know. There's an override in the memory > > > controller which we set when a device is attached to an IOMMU. That's > > > usually how the stream ID is programmed. If we don't do that it will > > > typically default to a special passthrough stream ID (technically the > > > firmware can lock down those register and force them to a specific > > > stream ID all the time). I'm not sure what exactly the impact is of > > > these ASID registers (there's a few other instances where those are > > > needed). They are required if you want to use the IOMMU, but I don't > > > know what their meaning is if the IOMMU is not enabled. > > > > > > There's also different cases: the IOMMU can be disabled altogether, in > > > which case no page tables and such exist for any device and no > > > translation should happen whatsoever. But there's also the case where an > > > IOMMU is enabled, but certain devices shouldn't attach to them. I should > > > make it very clear that both of these are not recommended setups and I > > > don't know if they'll work. And they are mostly untested. I've been > > > meaning, but haven't found the time, to test some of these cases. > > > > > > > Yes I agree, all of those situations are very niche and not recommended for > > a Tegra platform that should always have the iommu enabled anyways. Adding an > > iommu bypass would probably be outside of the scope of this patch. > > > > I will not add a patch marking the iommu as required in the device tree > > bindings when I submit v2 unless anyone else feels different. > > I was able to find a bit more information on this. Starting with > Tegra234 it's usually no longer possible to even enable bypass. It can > still be done, but it needs to be carefully coordinated between the > secure bootloader/firmware and the OS. Basically the secure firmware > can lock down the ability to bypass the IOMMU. If the firmware was > configured to allow bypass, the driver can then do so by writing the > special stream ID 0x7f into the stream ID register. > > On these newer chips the memory controller override no longer has any > effect and writing the per-device stream ID registers is the only way to > attach to the IOMMU. > > There's still the case where you can disable the IOMMU altogether, in > which case the IOMMU will still be bypassed, no matter what the firmware > did. My understanding is that it doesn't matter in those cases whether > we write the stream ID registers or not, they will simply get ignored. > With one exception perhaps being the bypass SID. If you write that, then > there's a protection mechanism that kicks in. > > Well, after all this this still isn't entirely clear to me, but I think > what it means in a nutshell is that a) we'll want to keep the IOMMU > always on for security and because the firmware is by default configured > to not allow bypassing, b) IOMMU isn't strictly required because the > IOMMU might be completely disabled and c) we shouldn't need to > conditionalize the stream ID register writes. > > That said, the tegra_dev_iommu_get_stream_id() function returns a bool > specifically to support the latter case. So the intention with the > design was that drivers would call that function and only need to write > the stream ID register if it returns true. That's not always great > because you may want (or need) to rewrite the register after suspend/ > resume, in which case you probably want to resort to a cached value > rather than call that API. On the other hand the API is quite simple, so > it could serve as a cache as well. > I guess I could add an IS_ENABLED(CONFIG_IOMMU_API) check if tegra_dev_iommu_get_stream_id() returns false in probe? That would cover if the IOMMU_API is not enabled but not if the Tegra's iommu is disabled in other ways. -Parker