RE: Regression on linux-next (next-20241120) and drm-tip

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Thomas Weißschuh <linux@xxxxxxxxxxxxxx>
> Sent: Tuesday, December 3, 2024 8:20 PM
> To: Rafael J. Wysocki <rafael@xxxxxxxxxx>
> Cc: Borah, Chaitanya Kumar <chaitanya.kumar.borah@xxxxxxxxx>; Kurmi,
> Suresh Kumar <suresh.kumar.kurmi@xxxxxxxxx>; Coelho, Luciano
> <luciano.coelho@xxxxxxxxx>; Saarinen, Jani <jani.saarinen@xxxxxxxxx>;
> Nikula, Jani <jani.nikula@xxxxxxxxx>; De Marchi, Lucas
> <lucas.demarchi@xxxxxxxxx>; intel-gfx@xxxxxxxxxxxxxxxxxxxxx; intel-
> xe@xxxxxxxxxxxxxxxxxxxxx; linux-pm@xxxxxxxxxxxxxxx; Sebastian Reichel
> <sebastian.reichel@xxxxxxxxxxxxx>
> Subject: Re: Regression on linux-next (next-20241120) and drm-tip
> 
> On 2024-12-03 15:33:21+0100, Rafael J. Wysocki wrote:
> > On Tue, Dec 3, 2024 at 1:04 PM Thomas Weißschuh
> <linux@xxxxxxxxxxxxxx> wrote:
> > >
> > > On 2024-12-03 12:54:54+0100, Rafael J. Wysocki wrote:
> > > > On Tue, Dec 3, 2024 at 7:51 AM Thomas Weißschuh
> <linux@xxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > (+Cc Sebastian)
> > > > >
> > > > > Hi Chaitanya,
> > > > >
> > > > > On 2024-12-03 05:07:47+0000, Borah, Chaitanya Kumar wrote:
> > > > > > Hope you are doing well. I am Chaitanya from the linux graphics team
> in Intel.
> > > > > >
> > > > > > This mail is regarding a regression we are seeing in our CI runs[1] on
> linux-next repository.
> > > > >
> > > > > Thanks for the report.
> > > > >
> > > > > > Since the version next-20241120 [2], we are seeing the
> > > > > > following regression
> > > > > >
> > > > > > `````````````````````````````````````````````````````````````````````````````````
> > > > > > <4>[   19.990743] Oops: general protection fault, probably for non-
> canonical address 0xb11675ef8d1ccbce: 0000 [#1] PREEMPT SMP NOPTI
> > > > > > <4>[   19.990760] CPU: 21 UID: 110 PID: 867 Comm: prometheus-
> node Not tainted 6.12.0-next-20241120-next-20241120-gac24e26aa08f+
> #1
> > > > > > <4>[   19.990771] Hardware name: Intel Corporation Arrow Lake
> Client Platform/MTL-S UDIMM 2DPC EVCRB, BIOS
> MTLSFWI1.R00.4400.D85.2410100007 10/10/2024
> > > > > > <4>[   19.990782] RIP: 0010:power_supply_get_property+0x3e/0xe0
> > > > > > ``````````````````````````````````````````````````````````````
> > > > > > ``````````````````` Details log can be found in [3].
> > > > > >
> > > > > > After bisecting the tree, the following patch [4] seems to be the first
> "bad"
> > > > > > commit
> > > > > >
> > > > > > ``````````````````````````````````````````````````````````````
> > > > > > ```````````````````````````````````````````
> > > > > > Commit 49000fee9e639f62ba1f965ed2ae4c5ad18d19e2
> > > > > > Author:     Thomas Weißschuh <mailto:linux@xxxxxxxxxxxxxx>
> > > > > > AuthorDate: Sat Oct 5 12:05:03 2024 +0200
> > > > > > Commit:     Sebastian Reichel
> <mailto:sebastian.reichel@xxxxxxxxxxxxx>
> > > > > > CommitDate: Tue Oct 15 22:22:20 2024 +0200
> > > > > >     power: supply: core: add wakeup source inhibit by
> > > > > > power_supply_config
> > > > > > ``````````````````````````````````````````````````````````````
> > > > > > ```````````````````````````````````````````
> > > > > >
> > > > > > This is now seen in our drm-tip runs as well. [5]
> > > > > >
> > > > > > Could you please check why the patch causes this regression and
> provide a fix if necessary?
> > > > >
> > > > > I don't see how this patch can lead to this error.
> > > >
> > > > It looks like the cfg->no_wakeup_source access reaches beyond the
> > > > struct boundary for some reason.
> > >
> > > But the access to this field is only done in __power_supply_register().
> > > The error reports however don't show this function at all, they come
> > > from power_supply_uevent() and power_supply_get_property() by which
> > > time the call to __power_supply_register() is long over.
> > >
> > > FWIW there is an uninitialized 'struct power_supply_config' in
> > > drivers/hid/hid-corsair-void.c. But I highly doubt the test machines
> > > are using that. (I'll send a patch later for it)
> >
> > So the only way I can think about in which the commit in question may
> > lead to the reported issues is that changing the size of struct
> > power_supply_config or its alignment makes an unexpected functional
> > difference somewhere.
> 
> Indeed. I'd really like to see this issue reproduced with KASAN.
> 
> > AFAICS, this commit cannot be reverted by itself, so which commits on
> > top of it need to be reverted in order to revert it cleanly?
> 
> All the other patches from this series:
> https://lore.kernel.org/lkml/20241005-power-supply-no-wakeup-source-v1-
> 0-1d62bf9bcb1d@xxxxxxxxxxxxxx/
> 
> Could you point me to the full boot log in the drm-tip CI?

Here is the log for drm-tip CI

https://intel-gfx-ci.01.org/tree/drm-tip/IGT_8136/bat-arls-5/boot0.txt

I carried out another bisect and it points to the following commit

commit 226ff2e681d006eada59a9693aa1976d4c15a7d4
Author: Heikki Krogerus <heikki.krogerus@xxxxxxxxxxxxxxx>
Date:   Wed Nov 6 17:06:05 2024 +0200

    usb: typec: ucsi: Convert connector specific commands to bitmaps

    That allows the fields in those command data structures to
    be easily validated. If an unsupported field is accessed, a
    warning is generated.

Reverting it seems to help locally. However, to confirm I have sent out a patch to our "try-bot"

https://patchwork.freedesktop.org/series/142049/

We can wait for its results.

Regards

Chaitanya





[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux