On 1/10/2023 9:38 PM, Vasant Hegde wrote: > Matt, > > > On 1/6/2023 12:58 PM, Matt Fagnani wrote: >> I booted 6.2-rc2 + patch with rd.driver.blacklist=amdgpu on the kernel command >> line to prevent amdgpu from being started while the initramfs was in use. The >> black screen problem happened later in the boot. I pressed sysrq+alt+s,u,b to do >> an emergency sync, remount read-only, and reboot. The journal for that boot was >> shown on the next boot. The two warnings which I previously reported weren't >> shown in the journal, but the same null pointer dereference which made amdgpu >> crash happened. I'm attaching the kernel log from the journal of that boot. >> > > Thanks for your effort to get boot log. This is helpful. > > Looking into the code further, > iommu_detach_group() didn't attach devices back to default_domain. ... because iommu_detach_group() expects new domain should be different from group->domain. -Vasant > So IOMMU > point of view device group was left in inconsistent state. This resulted in > IOMMU throwing page fault errors and amd IOMMU event handler code always assumes > that domain is setup properly. That resulted in below NULL pointer dereference > issue. > > Jan 06 02:07:52 kernel: BUG: kernel NULL pointer dereference, address: > 0000000000000058 > Jan 06 02:07:52 kernel: #PF: supervisor read access in kernel mode > Jan 06 02:07:53 kernel: #PF: error_code(0x0000) - not-present page > Jan 06 02:07:53 kernel: PGD 0 P4D 0 > Jan 06 02:07:53 kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI > Jan 06 02:07:53 kernel: CPU: 2 PID: 56 Comm: irq/24-AMD-Vi Not tainted > 6.2.0-rc2+ #89 > Jan 06 02:07:53 kernel: Hardware name: HP HP Laptop 15-bw0xx/8332, BIOS F.52 > 12/03/2019 > Jan 06 02:07:53 kernel: RIP: 0010:report_iommu_fault+0x11/0x90 > > Ideally if domain attach fails (in this case its because pasid capability check > returned error) we should put devices back to original domain.. so that it can > continue without PASID capability. > > I have a patch to handle these error conditions (not the fix for original > issue). I will try to post it soon. > > -Vasant