Matt, On 1/6/2023 12:58 PM, Matt Fagnani wrote: > I booted 6.2-rc2 + patch with rd.driver.blacklist=amdgpu on the kernel command > line to prevent amdgpu from being started while the initramfs was in use. The > black screen problem happened later in the boot. I pressed sysrq+alt+s,u,b to do > an emergency sync, remount read-only, and reboot. The journal for that boot was > shown on the next boot. The two warnings which I previously reported weren't > shown in the journal, but the same null pointer dereference which made amdgpu > crash happened. I'm attaching the kernel log from the journal of that boot. > Thanks for your effort to get boot log. This is helpful. Looking into the code further, iommu_detach_group() didn't attach devices back to default_domain. So IOMMU point of view device group was left in inconsistent state. This resulted in IOMMU throwing page fault errors and amd IOMMU event handler code always assumes that domain is setup properly. That resulted in below NULL pointer dereference issue. Jan 06 02:07:52 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000058 Jan 06 02:07:52 kernel: #PF: supervisor read access in kernel mode Jan 06 02:07:53 kernel: #PF: error_code(0x0000) - not-present page Jan 06 02:07:53 kernel: PGD 0 P4D 0 Jan 06 02:07:53 kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI Jan 06 02:07:53 kernel: CPU: 2 PID: 56 Comm: irq/24-AMD-Vi Not tainted 6.2.0-rc2+ #89 Jan 06 02:07:53 kernel: Hardware name: HP HP Laptop 15-bw0xx/8332, BIOS F.52 12/03/2019 Jan 06 02:07:53 kernel: RIP: 0010:report_iommu_fault+0x11/0x90 Ideally if domain attach fails (in this case its because pasid capability check returned error) we should put devices back to original domain.. so that it can continue without PASID capability. I have a patch to handle these error conditions (not the fix for original issue). I will try to post it soon. -Vasant