Re: Bug#1054514: linux-image-6.1.0-13-amd64: Debian VM with qxl graphics freezes frequently

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Timo,

On Tue, Oct 24, 2023 at 11:14:32PM +0300, Timo Lindfors wrote:
> Package: src:linux
> Version: 6.1.55-1
> Severity: normal
> 
> Steps to reproduce:
> 1) Install Debian 12 as a virtual machine using virt-manager, choose qxl
>    graphics card. You only need basic installation without wayland or X.
> 2) Login from the console and save thë following to reproduce.bash:
> 
> #!/bin/bash
> 
> chvt 3
> for j in $(seq 80); do
>     echo "$(date) starting round $j"
>     if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != "" ];
> then
>         echo "bug was reproduced after $j tries"
>         exit 1
>     fi
>     for i in $(seq 100); do
>         dmesg > /dev/tty3
>     done
> done
> 
> echo "bug could not be reproduced"
> exit 0
> 
> 
> 3) Run chmod a+x reproduce.bash
> 4) Run ./reproduce.bash and wait for up to 20 minutes.
> 
> Expected results:
> 4) The system prints a steady flow of text without kernel error messages
> 
> Actual messages:
> 4) At some point the text stops flowing and the script prints "bug was
>    reproduced". If you run "journalctl --boot" you see
> 
> kernel: [TTM] Buffer eviction failed
> kernel: qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001)
> kernel: [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
> 
> 
> 
> More info:
> 1) The bug does not occur if I downgrade the kernel to
>    linux-image-5.10.0-26-amd64_5.10.197-1_amd64.deb from Debian 11.
> 2) I used the following test_linux.bash to bisect this issue against
>    upstream source:
> 
> #!/bin/bash
> set -x
> 
> gitversion="$(git describe HEAD|sed 's@^v@@')"
> 
> git checkout drivers/gpu/drm/ttm/ttm_bo.c include/drm/ttm/ttm_bo_api.h
> git show bec771b5e0901f4b0bc861bcb58056de5151ae3a | patch -p1
> # Build
> cp ~/kernel.config .config
> # cp /boot/config-$(uname -r) .config
> # scripts/config --enable LOCALVERSION_AUTO
> # scripts/config --disable DEBUG_INFO
> # scripts/config --disable SYSTEM_TRUSTED_KEYRING
> # scripts/config --set-str SYSTEM_TRUSTED_KEYS ''
> # scripts/config --disable STACKPROTECTOR_STRONG
> make olddefconfig
> # make localmodconfig
> make -j$(nproc --all) bindeb-pkg
> rc="$?"
> if [ "$rc" != "0" ]; then
>     exit 125
> fi
> git checkout drivers/gpu/drm/ttm/ttm_bo.c include/drm/ttm/ttm_bo_api.h
> 
> package="$(ls --sort=time ../linux-image-*_amd64.deb|head -n1)"
> version=$(echo $package | cut -d_ -f1|cut -d- -f3-)
> 
> if [ "$gitversion" != "$version" ]; then
>     echo "Build produced version $gitversion but got $version, ignoring"
>     #exit 255
> fi
> 
> # Deploy
> scp $package target:a.deb
> ssh target sudo apt install ./a.deb
> ssh target rm -f a.deb
> ssh target ./grub_set_default_version.bash $version
> ssh target sudo shutdown -r now
> sleep 40
> 
> detected_version=$(ssh target uname -r)
> if [ "$detected_version" != "$version" ]; then
>     echo "Booted to $detected_version but expected $version"
>     exit 255
> fi
> 
> # Test
> exec ssh target sudo ./reproduce.bash
> 
> 
> Bisect printed the following log:
> 
> git bisect start
> # bad: [ed29c2691188cf7ea2a46d40b891836c2bd1a4f5] drm/i915: Fix userptr so we do not have to worry about obj->mm.lock, v7.
> git bisect bad ed29c2691188cf7ea2a46d40b891836c2bd1a4f5
> # bad: [762949bb1da78941b25e63f7e952af037eee15a9] drm: fix drm_mode_create_blob comment
> git bisect bad 762949bb1da78941b25e63f7e952af037eee15a9
> # bad: [e40f97ef12772f8eb04b6a155baa1e0e2e8f3ecc] drm/gma500: Drop DRM_GMA600 config option
> git bisect bad e40f97ef12772f8eb04b6a155baa1e0e2e8f3ecc
> # bad: [5a838e5d5825c85556011478abde708251cc0776] drm/qxl: simplify qxl_fence_wait
> git bisect bad 5a838e5d5825c85556011478abde708251cc0776
> # bad: [d2b6f8a179194de0ffc4886ffc2c4358d86047b8] Merge tag 'xfs-5.13-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
> git bisect bad d2b6f8a179194de0ffc4886ffc2c4358d86047b8
> # bad: [68a32ba14177d4a21c4a9a941cf1d7aea86d436f] Merge tag 'drm-next-2021-04-28' of git://anongit.freedesktop.org/drm/drm
> git bisect bad 68a32ba14177d4a21c4a9a941cf1d7aea86d436f
> # bad: [0698b13403788a646073fcd9b2294f2dce0ce429] drm/amdgpu: skip PP_MP1_STATE_UNLOAD on aldebaran
> git bisect bad 0698b13403788a646073fcd9b2294f2dce0ce429
> # bad: [e1a5e6a8c48bf99ea374fb3e535661cfe226bca4] drm/doc: Add RFC section
> git bisect bad e1a5e6a8c48bf99ea374fb3e535661cfe226bca4
> # bad: [ed29c2691188cf7ea2a46d40b891836c2bd1a4f5] drm/i915: Fix userptr so we do not have to worry about obj->mm.lock, v7.
> git bisect bad ed29c2691188cf7ea2a46d40b891836c2bd1a4f5
> # bad: [2c8ab3339e398bbbcb0980933e266b93bedaae52] drm/i915: Pin timeline map after first timeline pin, v4.
> git bisect bad 2c8ab3339e398bbbcb0980933e266b93bedaae52
> # bad: [2eb8e1a69d9f8cc9c0a75e327f854957224ba421] drm/i915/gem: Drop relocation support on all new hardware (v6)
> git bisect bad 2eb8e1a69d9f8cc9c0a75e327f854957224ba421
> # bad: [b5b6f6a610127b17f20c0ca03dd27beee4ddc2b2] drm/i915/gem: Drop legacy execbuffer support (v2)
> git bisect bad b5b6f6a610127b17f20c0ca03dd27beee4ddc2b2
> # bad: [06debd6e1b28029e6e77c41e59a162868f377897] Merge tag 'drm-intel-next-2021-03-16' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
> git bisect bad 06debd6e1b28029e6e77c41e59a162868f377897
> # good: [e19eede54240d64b4baf9b0df4dfb8191f7ae48b] Merge branch 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
> git bisect good e19eede54240d64b4baf9b0df4dfb8191f7ae48b
> # good: [1e28eed17697bcf343c6743f0028cc3b5dd88bf0] Linux 5.12-rc3
> git bisect good 1e28eed17697bcf343c6743f0028cc3b5dd88bf0
> # bad: [6af70eb3b40edfc8bdf2373cdc2bcf9d5a20c8c7] drm/atmel-hlcdc: Rename custom plane state variable
> git bisect bad 6af70eb3b40edfc8bdf2373cdc2bcf9d5a20c8c7
> # good: [4ca77c513537700d3fae69030879f781dde1904c] drm/qxl: release shadow on shutdown
> git bisect good 4ca77c513537700d3fae69030879f781dde1904c
> # bad: [4a11bd1e88af130f50a72e0f54391c1c7d268e03] drm/ast: Add constants for VGACRCB register bits
> git bisect bad 4a11bd1e88af130f50a72e0f54391c1c7d268e03
> # bad: [5c209d8056b9763ce544ecd7dadb3782cdaf96ed] drm/gma500: psb_spank() doesn't need it's own file
> git bisect bad 5c209d8056b9763ce544ecd7dadb3782cdaf96ed
> # bad: [db0c6bd2c0c0dada8927cd46a7c34c316a3a6c04] drm/gem: Export drm_gem_vmap() and drm_gem_vunmap()
> git bisect bad db0c6bd2c0c0dada8927cd46a7c34c316a3a6c04
> # bad: [f4a84e165e6d58606097dd07b5b78767a94b870c] drm/qxl: allocate dumb buffers in ram
> git bisect bad f4a84e165e6d58606097dd07b5b78767a94b870c
> # good: [a7709b9b89a67f3ead2d188b1d0c261059b1f291] drm/qxl: handle shadow in primary destroy
> git bisect good a7709b9b89a67f3ead2d188b1d0c261059b1f291
> # bad: [5a838e5d5825c85556011478abde708251cc0776] drm/qxl: simplify qxl_fence_wait
> git bisect bad 5a838e5d5825c85556011478abde708251cc0776
> # good: [5f6c871fe919999774e8535ea611a6f84ee43ee4] drm/qxl: properly free qxl releases
> git bisect good 5f6c871fe919999774e8535ea611a6f84ee43ee4
> # first bad commit: [5a838e5d5825c85556011478abde708251cc0776] drm/qxl: simplify qxl_fence_wait
> 
> I took a look at
> 
> commit 5a838e5d5825c85556011478abde708251cc0776 (refs/bisect/bad)
> Author: Gerd Hoffmann <kraxel@xxxxxxxxxx>
> Date:   Thu Feb 4 15:57:10 2021 +0100
> 
>     drm/qxl: simplify qxl_fence_wait
> 
>     Now that we have the new release_event wait queue we can just
>     use that in qxl_fence_wait() and simplify the code a lot.
> 
>     Signed-off-by: Gerd Hoffmann <kraxel@xxxxxxxxxx>
>     Acked-by: Thomas Zimmermann <tzimmermann@xxxxxxx>
>     Link: http://patchwork.freedesktop.org/patch/msgid/20210204145712.1531203-10-kraxel@xxxxxxxxxx
> 
> 
> and noticed that the bug does not occur if I boot 6.1 kernel with this patch
> reverted (see attached file).

Thanks for the excelent constructed report! I think it's best to
forward this directly to upstream including the people for the
bisected commit to get some idea.

Can you reproduce the issue with 6.5.8-1 in unstable as well?

If not, are you able to isolate an upstream fix which should be
backported to the 6.1.y series as well?

Regards,
Salvatore



[Index of Archives]     [Linux Virtualization]     [Linux Virtualization]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]     [Monitors]