Re: [REGRESSION] QXL display malfunction

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Thomas, was there some progress wrt to fixing below regression? I might
have missed something, but from here it looks like this fall through the
cracks.

Makes me wonder if we should temporarily revert this for now to fix this
for rc7 and ensure things get at least one week of testing before the final.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

On 14.06.24 15:45, Kaplan, David wrote:
> [AMD Official Use Only - AMD Internal Distribution Only]
> 
>> -----Original Message-----
>> From: Thomas Zimmermann <tzimmermann@xxxxxxx>
>> Sent: Wednesday, June 12, 2024 9:26 AM
>> To: Linux regressions mailing list <regressions@xxxxxxxxxxxxxxx>
>> Cc: Petkov, Borislav <Borislav.Petkov@xxxxxxx>;
>> zack.rusin@xxxxxxxxxxxx; dmitry.osipenko@xxxxxxxxxxxxx; Kaplan, David
>> <David.Kaplan@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>;
>> Dave Airlie <airlied@xxxxxxxxxx>; Maarten Lankhorst
>> <maarten.lankhorst@xxxxxxxxxxxxxxx>; Maxime Ripard
>> <mripard@xxxxxxxxxx>; LKML <linux-kernel@xxxxxxxxxxxxxxx>; ML dri-devel
>> <dri-devel@xxxxxxxxxxxxxxxxxxxxx>; spice-devel@xxxxxxxxxxxxxxxxxxxxx;
>> virtualization@xxxxxxxxxxxxxxx
>> Subject: Re: [REGRESSION] QXL display malfunction
>>
>> Caution: This message originated from an External Source. Use proper
>> caution when opening attachments, clicking links, or responding.
>>
>>
>> Hi
>>
>> Am 12.06.24 um 14:41 schrieb Linux regression tracking (Thorsten Leemhuis):
>>> [CCing a few more people and lists that get_maintainers pointed out
>>> for qxl]
>>>
>>> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
>>> for once, to make this easily accessible to everyone.
>>>
>>> Thomas, from here it looks like this report that apparently is caused
>>> by a change of yours that went into 6.10-rc1 (b33651a5c98dbd
>>> ("drm/qxl: Do not pin buffer objects for vmap")) fell through the
>>> cracks. Or was progress made to resolve this and I just missed this?
>>>
>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker'
>>> hat)
>>> --
>>> Everything you wanna know about Linux kernel regression tracking:
>>> https://linux-regtracking.leemhuis.info/about/#tldr
>>> If I did something stupid, please tell me, as explained on that page.
>>>
>>> #regzbot poke
>>>
>>>
>>> On 03.06.24 04:29, Kaplan, David wrote:
>>>>> -----Original Message-----
>>>>> From: Kaplan, David
>>>>> Sent: Sunday, June 2, 2024 9:25 PM
>>>>> To: tzimmermann@xxxxxxx; dmitry.osipenko@xxxxxxxxxxxxx; Koenig,
>>>>> Christian <Christian.Koenig@xxxxxxx>; zach.rusin@xxxxxxxxxxxx
>>>>> Cc: Petkov, Borislav <Borislav.Petkov@xxxxxxx>;
>>>>> regressions@xxxxxxxxxxxxxx
>>>>> Subject: [REGRESSION] QXL display malfunction
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am running an Ubuntu 19.10 VM with a tip kernel using QXL video
>>>>> and I've observed the VM graphics often malfunction after boot,
>>>>> sometimes failing to load the Ubuntu desktop or even immediately
>> shutting the guest down.
>>>>> When it does load, the guest dmesg log often contains errors like
>>>>>
>>>>> [    4.303586] [drm:drm_atomic_helper_commit_planes] *ERROR* head
>> 1
>>>>> wrong: 65376256x16777216+0+0
>>>>> [    4.586883] [drm:drm_atomic_helper_commit_planes] *ERROR* head
>> 1
>>>>> wrong: 65376256x16777216+0+0
>>>>> [    4.904036] [drm:drm_atomic_helper_commit_planes] *ERROR* head
>> 1
>>>>> wrong: 65335296x16777216+0+0
>>
>> I don't see how these messages are related. Did they already appear before
>> the broken commit was there?
> 
> No, I did not observe them prior to the broken commit.
> 
>>
>>>>> [    5.374347] [drm:qxl_release_from_id_locked] *ERROR* failed to find
>> id in
>>>>> release_idr
>>
>> Is there only one such message in the log? Or multiple/frequent ones.
> 
> I would usually only see one.
> 
>>
>> Could you provide a stack trace of what happens before?
> 
> Here's the top of a backtrace when the error occurs:
> #0  qxl_release_from_id_locked (qdev=qdev@entry=0xffff88810126e000, id=id@entry=262151)
>     at drivers/gpu/drm/qxl/qxl_release.c:373
> #1  0xffffffff819f5b6a in qxl_garbage_collect (qdev=0xffff88810126e000)
>     at drivers/gpu/drm/qxl/qxl_cmd.c:222
> #2  0xffffffff810e3aa8 in process_one_work (worker=worker@entry=0xffff888101680300,
>     work=0xffff88810126f340) at kernel/workqueue.c:3231
> #3  0xffffffff810e6281 in process_scheduled_works (worker=<optimized out>)
>     at kernel/workqueue.c:3312
> #4  worker_thread (__worker=0xffff888101680300) at kernel/workqueue.c:3393
> 
>>
>> We sometimes draw into the buffer object from the CPU. For accessing the
>> buffer object's pages from the CPU, only a vmap operation should be
>> necessary. It appears as if qxl also requires a pin. My guess is that the pin
>> inserts the buffer-object's host-side pages and the code around
>> qxl_release_from_id_locked() appears to be garbage-collecting them.
>> Hence without the pin, the GC complains about inconsistent state.
>>>>>
>>>>> I bisected the issue down to "drm/qxl: Do not pin buffer objects for
>> vmap"
>>>>> (b33651a5c98dbd5a919219d8c129d0674ef74299).
>>
>> Thanks for bisecting. Does it work if you revert that commit?
> 
> Yes
> 
> Thanks --David Kaplan



[Index of Archives]     [Linux Virtualization]     [Linux Virtualization]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]     [Monitors]