On 5/15/2018 9:47 AM, Joseph Salisbury wrote: > On 05/15/2018 09:08 AM, Tom Lendacky wrote: >> On 5/15/2018 7:34 AM, Nath, Arindam wrote: >>> >>>> -----Original Message----- >>>> From: Joseph Salisbury [mailto:joseph.salisbury at canonical.com] >>>> Sent: Tuesday, May 15, 2018 5:40 PM >>>> To: Nath, Arindam <Arindam.Nath at amd.com> >>>> Cc: iommu at lists.linux-foundation.org; Bridgman, John >>>> <John.Bridgman at amd.com>; joro at 8bytes.org; amd- >>>> gfx at lists.freedesktop.org; drake at endlessm.com; stein12c at gmail.com; >>>> Suthikulpanit, Suravee <Suravee.Suthikulpanit at amd.com>; Deucher, >>>> Alexander <Alexander.Deucher at amd.com>; Kuehling, Felix >>>> <Felix.Kuehling at amd.com>; linux at endlessm.com; michel at daenzer.net; >>>> 1747463 at bugs.launchpad.net; Lendacky, Thomas >>>> <Thomas.Lendacky at amd.com> >>>> Subject: Re: iommu/amd: flush IOTLB for specific domains only (v2) >>>> >>>> On 05/15/2018 04:03 AM, Nath, Arindam wrote: >>>>> Adding Tom. >>>>> >>>>> Hi Joe, >>>>> >>>>> My original patch was never accepted. Tom and Joerg worked on another >>>> patch series which was supposed to fix the issue in question in addition to do >>>> some code cleanups. I believe their patches are already in the mainline. If I >>>> remember correctly, one of the patches disabled PCI ATS for the graphics >>>> card which was causing the issue. >>>>> Do you still see the issue with latest mainline kernel? >>>>> >>>>> BR, >>>>> Arindam >>>>> >>>>> -----Original Message----- >>>>> From: Joseph Salisbury [mailto:joseph.salisbury at canonical.com] >>>>> Sent: Tuesday, May 15, 2018 1:17 AM >>>>> To: Nath, Arindam <Arindam.Nath at amd.com> >>>>> Cc: iommu at lists.linux-foundation.org; Bridgman, John >>>>> <John.Bridgman at amd.com>; joro at 8bytes.org; >>>>> amd-gfx at lists.freedesktop.org; drake at endlessm.com; >>>> stein12c at gmail.com; >>>>> Suthikulpanit, Suravee <Suravee.Suthikulpanit at amd.com>; Deucher, >>>>> Alexander <Alexander.Deucher at amd.com>; Kuehling, Felix >>>>> <Felix.Kuehling at amd.com>; linux at endlessm.com; michel at daenzer.net; >>>>> 1747463 at bugs.launchpad.net >>>>> Subject: iommu/amd: flush IOTLB for specific domains only (v2) >>>>> >>>>> Hello Arindam, >>>>> >>>>> There is a bug report[0] that you created a patch[1] for a while back. >>>> However, the patch never landed in mainline. There is a bug reporter in >>>> Ubuntu[2] that is affected by this bug and is willing to test the patch. I >>>> attempted to build a test kernel with the patch, but it does not apply to >>>> currently mainline cleanly. Do you still think this patch may resolve this >>>> bug? If so, is there a version of your patch available that will apply to current >>>> mainline? >>>>> Thanks, >>>>> >>>>> Joe >>>>> >>>>> [0] https://bugs.freedesktop.org/show_bug.cgi?id=101029 >>>>> [1] https://patchwork.freedesktop.org/patch/157327/ >>>>> [2] http://pad.lv/1747463 >>>>> >>>> Hi Arindam, >>>> >>>> Thanks for the feedback. Yes, the latest mainline kernel was tested, and it is >>>> reported the bug still happens in the Ubuntu kernel bug[0]. Is there any >>>> specific diagnostic info we can collect that might help? >>> Joe, I believe all the information needed is already provided in [2]. Let us wait for inputs from Tom and Joerg. >>> >>> I could take a look at the issue locally, but it will take me some really long time since I am occupied with other assignments right now. >> I don't see anything in the bug that indicates the latest mainline kernel >> was tested. The patches/fixes in question are part of the 4.13 kernel, I >> only see references to 4.10 kernels so I wouldn't expect the issue to be >> resolved unless the patches from 4.13 were backported to the Ubuntu 4.10 >> kernel. >> >> Thanks, >> Tom >> >>> BR, >>> Arindam >>> >>>> Thanks, >>>> >>>> Joe >>>> >>>> [0] http://pad.lv/1747463 > Hi Tom, > > The request to test mainline was in comment #30[0]. However, the bug > reporter stated the bug still existed on IRC and not in the bug report. > I'll request he adds the test results to the bug. > Ok, I was looking at the wrong bug. For the original 4.13 kernel, I don't see any attachments that have the AMD-Vi messages in question. Were they completion timeouts (like in the later mainline kernel test, which I'll get to in a bit) or I/O page fault messages? Without that information it is hard to determine what the issue really is. (Just as an FYI, if the IOMMU is disabled in BIOS, then iommu=soft is not necessary on the kernel command line). For the upstream kernel test, since this is a Ryzen system, it's possible that the BIOS does not have a requisite fix for SME and IOMMU (see [1]). On the upstream kernel, if memory encryption is active by default without this BIOS fix, then the result is AMD-Vi completion-wait timeout messages. Try booting with mem_encrypt=off on the kernel command line or build a kernel with CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=n and see if that allows the kernel to boot. Thanks, Tom [1] https://bugzilla.kernel.org/show_bug.cgi?id=199513 > Thanks, > > Joe > > > > > [0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1747463/comments/30 >