On Wed, May 22, 2024 at 12:58 AM Armin Wolf <W_Armin@xxxxxx> wrote: > > Am 20.05.24 um 18:22 schrieb Alex Deucher: > > > On Sat, May 18, 2024 at 8:17 PM Armin Wolf <W_Armin@xxxxxx> wrote: > >> Am 17.05.24 um 03:30 schrieb Barry Kauler: > >> > >>> Armin, Yifan, Prike, > >>> I will top-post, so you don't have to scroll down. > >>> After identifying the commit that causes black screen with my gpu, I > >>> posted the result to you guys, on May 9. > >>> It is now May 17 and no reply. > >>> OK, I have now created a patch that reverts Yifan's commit, compiled > >>> 5.15.158, and my gpu now works. > >>> Note, the radeon module is not loaded, so it is not a factor. > >>> I'm not a kernel developer. I have identified the culprit and it is up > >>> to you guys to fix it, Yifan especially, as you are the person who has > >>> created the regression. > >>> I will attach my patch. > >>> Regards, > >>> Barry Kauler > >> Hi, > >> > >> sorry for not responding to your findings. I normally do not work with GPU drivers, > >> so i hoped one of the amdgpu developers would handle this. > >> > >> I CCeddri-devel@xxxxxxxxxxxxxxxxxxxxx and amd-gfx@xxxxxxxxxxxxxxxxxxxxx so that other > >> amdgpu developers hear from this issue. > >> > >> Thanks you for you persistence in finding the offending commit. > > Likely this patch should not have been ported to 5.15 in the first > > place. The IOMMU requirements have been dropped from the driver for > > the last few kernel versions so it is no longer relevant on newer > > kernels. > > > > Alex > > Barry, can you verify that the latest upstream kernel works on you device? > If yes, then the commit itself is ok and just the backporting itself was wrong. > > Thanks, > Armin Wolf Armin, The unmodified 6.8.1 kernel works ok. I presume that patch was applied long before 6.8.1 got released and only got backported to 5.15.x recently. Regards, Barry > >> Armin Wolf > >> > >>> On Thu, May 9, 2024 at 4:08 PM Barry Kauler <bkauler@xxxxxxxxx> wrote: > >>>> On Fri, May 3, 2024 at 9:03 PM Armin Wolf <W_Armin@xxxxxx> wrote: > >>>>>> ... > >>>>>> # lspci | grep VGA > >>>>>> 05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. > >>>>>> [AMD/ATI] Picasso/Raven 2 [Radeon Vega Series / Radeon Vega Mobile > >>>>>> Series] (rev c2) > >>>>>> 05:00.7 Non-VGA unclassified device: Advanced Micro Devices, Inc. > >>>>>> [AMD] Raven/Raven2/Renoir Non-Sensor Fusion Hub KMDF driver > >>>>>> > >>>>>> # lspci -n -k > >>>>>> ... > >>>>>> 05:00.0 0300: 1002:15d8 (rev c2) > >>>>>> Subsystem: 1025:1456 > >>>>>> Kernel driver in use: amdgpu > >>>>>> Kernel modules: amdgpu > >>>>>> ... > >>>>> thanks for informing us of this regression. Since there are four commits affecting > >>>>> amdgpu in 5.15.150, i suggest that you use "git bisect" to find the faulty commits, > >>>>> see https://docs.kernel.org/admin-guide/bug-bisect.html for details. > >>>>> > >>>>> I think you can speed up the bisecting process by limiting yourself to the AMD DRM > >>>>> driver directory with "git bisect start -- drivers/gpu/drm/amd", take a look at the > >>>>> man page of "git bisect" for details. > >>>>> > >>>>> Thanks, > >>>>> Armin Wolf > >>>> Armin, > >>>> Thanks for the advice. I am unfamiliar with git on the commandline. > >>>> Previously only used SmartGit gui. > >>>> EasyOS requires aufs patch, and for a few days tried to figure out how > >>>> to use that with git bisect, then gave up. Changed to testing with my > >>>> "QV" distro, which is more conventional, doesn't need any kernel > >>>> patches. Managed to get it down to one commit. Here are the steps I > >>>> followed: > >>>> > >>>> # git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git > >>>> # cd linux-stable > >>>> # git tag -l | grep '5\.15\.150' > >>>> v5.15.150 > >>>> # git checkout -b my5.15.150 v5.15.150 > >>>> Updating files: 100% (65776/65776), done. > >>>> Switched to a new branch 'my5.15.150' > >>>> > >>>> Copied in my .config then... > >>>> > >>>> # make menuconfig > >>>> # git bisect start -- drivers/gpu/drm/amd > >>>> # git bisect bad > >>>> # git bisect good v5.15.149 > >>>> Bisecting: 1 revision left to test after this (roughly 1 step) > >>>> [b9a61ee2bb2704e42516e3da962f99dfa98f3b20] drm/amdgpu: reset gpu for > >>>> s3 suspend abort case > >>>> # make > >>>> # rm -rf /boot2 > >>>> # mkdir -p /boot2/lib/modules > >>>> # make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install > >>>> # cp arch/x86/boot/bzImage /boot2/vmlinuz > >>>> # sync > >>>> ...QV on Acer laptop, with amdgpu, works! > >>>> # git bisect good > >>>> Bisecting: 0 revisions left to test after this (roughly 0 steps) > >>>> [56b522f4668167096a50c39446d6263c96219f5f] drm/amdgpu: init iommu > >>>> after amdkfd device init > >>>> # make > >>>> # mkdir -p /boot2/lib/modules > >>>> # make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install > >>>> # cp arch/x86/boot/bzImage /boot2/vmlinuz > >>>> # sync > >>>> ...QV on Acer laptop, black screen! > >>>> > >>>> # git bisect bad > >>>> 56b522f4668167096a50c39446d6263c96219f5f is the first bad commit > >>>> commit 56b522f4668167096a50c39446d6263c96219f5f > >>>> Author: Yifan Zhang <yifan1.zhang@xxxxxxx> > >>>> Date: Tue Sep 28 15:42:35 2021 +0800 > >>>> > >>>> drm/amdgpu: init iommu after amdkfd device init > >>>> > >>>> [ Upstream commit 286826d7d976e7646b09149d9bc2899d74ff962b ] > >>>> > >>>> This patch is to fix clinfo failure in Raven/Picasso: > >>>> > >>>> Number of platforms: 1 > >>>> Platform Profile: FULL_PROFILE > >>>> Platform Version: OpenCL 2.2 AMD-APP (3364.0) > >>>> Platform Name: AMD Accelerated Parallel Processing > >>>> Platform Vendor: Advanced Micro Devices, Inc. > >>>> Platform Extensions: cl_khr_icd cl_amd_event_callback > >>>> > >>>> Platform Name: AMD Accelerated Parallel Processing Number of devices: 0 > >>>> > >>>> Signed-off-by: Yifan Zhang <yifan1.zhang@xxxxxxx> > >>>> Reviewed-by: James Zhu <James.Zhu@xxxxxxx> > >>>> Tested-by: James Zhu <James.Zhu@xxxxxxx> > >>>> Acked-by: Felix Kuehling <Felix.Kuehling@xxxxxxx> > >>>> Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx> > >>>> Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx> > >>>> > >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++---- > >>>> 1 file changed, 4 insertions(+), 4 deletions(-) > >>>> > >>>> Anything else I should do, to identify what in this commit is the > >>>> likely culprit? > >>>> Regards, > >>>> Barry Kauler