Re: [REGRESSION] QXL display malfunction

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Am 01.07.24 um 12:02 schrieb Linux regression tracking (Thorsten Leemhuis):
Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Thomas, was there some progress wrt to fixing below regression? I might
have missed something, but from here it looks like this fall through the
cracks.

Thanks for reminding.



Makes me wonder if we should temporarily revert this for now to fix this
for rc7 and ensure things get at least one week of testing before the final.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

On 14.06.24 15:45, Kaplan, David wrote:
[AMD Official Use Only - AMD Internal Distribution Only]

-----Original Message-----
From: Thomas Zimmermann <tzimmermann@xxxxxxx>
Sent: Wednesday, June 12, 2024 9:26 AM
To: Linux regressions mailing list <regressions@xxxxxxxxxxxxxxx>
Cc: Petkov, Borislav <Borislav.Petkov@xxxxxxx>;
zack.rusin@xxxxxxxxxxxx; dmitry.osipenko@xxxxxxxxxxxxx; Kaplan, David
<David.Kaplan@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>;
Dave Airlie <airlied@xxxxxxxxxx>; Maarten Lankhorst
<maarten.lankhorst@xxxxxxxxxxxxxxx>; Maxime Ripard
<mripard@xxxxxxxxxx>; LKML <linux-kernel@xxxxxxxxxxxxxxx>; ML dri-devel
<dri-devel@xxxxxxxxxxxxxxxxxxxxx>; spice-devel@xxxxxxxxxxxxxxxxxxxxx;
virtualization@xxxxxxxxxxxxxxx
Subject: Re: [REGRESSION] QXL display malfunction

Caution: This message originated from an External Source. Use proper
caution when opening attachments, clicking links, or responding.


Hi

Am 12.06.24 um 14:41 schrieb Linux regression tracking (Thorsten Leemhuis):
[CCing a few more people and lists that get_maintainers pointed out
for qxl]

Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Thomas, from here it looks like this report that apparently is caused
by a change of yours that went into 6.10-rc1 (b33651a5c98dbd
("drm/qxl: Do not pin buffer objects for vmap")) fell through the
cracks. Or was progress made to resolve this and I just missed this?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker'
hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke


On 03.06.24 04:29, Kaplan, David wrote:
-----Original Message-----
From: Kaplan, David
Sent: Sunday, June 2, 2024 9:25 PM
To: tzimmermann@xxxxxxx; dmitry.osipenko@xxxxxxxxxxxxx; Koenig,
Christian <Christian.Koenig@xxxxxxx>; zach.rusin@xxxxxxxxxxxx
Cc: Petkov, Borislav <Borislav.Petkov@xxxxxxx>;
regressions@xxxxxxxxxxxxxx
Subject: [REGRESSION] QXL display malfunction

Hi,

I am running an Ubuntu 19.10 VM with a tip kernel using QXL video
and I've observed the VM graphics often malfunction after boot,
sometimes failing to load the Ubuntu desktop or even immediately
shutting the guest down.
When it does load, the guest dmesg log often contains errors like

[    4.303586] [drm:drm_atomic_helper_commit_planes] *ERROR* head
1
wrong: 65376256x16777216+0+0
[    4.586883] [drm:drm_atomic_helper_commit_planes] *ERROR* head
1
wrong: 65376256x16777216+0+0
[    4.904036] [drm:drm_atomic_helper_commit_planes] *ERROR* head
1
wrong: 65335296x16777216+0+0
I don't see how these messages are related. Did they already appear before
the broken commit was there?
No, I did not observe them prior to the broken commit.

[    5.374347] [drm:qxl_release_from_id_locked] *ERROR* failed to find
id in
release_idr
Is there only one such message in the log? Or multiple/frequent ones.
I would usually only see one.

Could you provide a stack trace of what happens before?
Here's the top of a backtrace when the error occurs:
#0  qxl_release_from_id_locked (qdev=qdev@entry=0xffff88810126e000, id=id@entry=262151)
     at drivers/gpu/drm/qxl/qxl_release.c:373
#1  0xffffffff819f5b6a in qxl_garbage_collect (qdev=0xffff88810126e000)
     at drivers/gpu/drm/qxl/qxl_cmd.c:222
#2  0xffffffff810e3aa8 in process_one_work (worker=worker@entry=0xffff888101680300,
     work=0xffff88810126f340) at kernel/workqueue.c:3231
#3  0xffffffff810e6281 in process_scheduled_works (worker=<optimized out>)
     at kernel/workqueue.c:3312
#4  worker_thread (__worker=0xffff888101680300) at kernel/workqueue.c:3393

We sometimes draw into the buffer object from the CPU. For accessing the
buffer object's pages from the CPU, only a vmap operation should be
necessary. It appears as if qxl also requires a pin. My guess is that the pin
inserts the buffer-object's host-side pages and the code around
qxl_release_from_id_locked() appears to be garbage-collecting them.
Hence without the pin, the GC complains about inconsistent state.
I bisected the issue down to "drm/qxl: Do not pin buffer objects for
vmap"
(b33651a5c98dbd5a919219d8c129d0674ef74299).
Thanks for bisecting. Does it work if you revert that commit?
Yes

Thanks --David Kaplan

--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)




[Index of Archives]     [Linux Virtualization]     [Linux Virtualization]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]     [Monitors]