Re: [PATCH v2] drm: Ensure Proper Unload/Reload Order of MEI Modules for i915/Xe Driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



+ linux-modules
+ Luis

On Wed, Sep 11, 2024 at 01:00:47AM GMT, Bommu, Krishnaiah wrote:


-----Original Message-----
From: De Marchi, Lucas <lucas.demarchi@xxxxxxxxx>
Sent: Tuesday, September 10, 2024 9:13 PM
To: Vivi, Rodrigo <rodrigo.vivi@xxxxxxxxx>
Cc: Bommu, Krishnaiah <krishnaiah.bommu@xxxxxxxxx>; intel-
xe@xxxxxxxxxxxxxxxxxxxxx; intel-gfx@xxxxxxxxxxxxxxxxxxxxx; Kamil Konieczny
<kamil.konieczny@xxxxxxxxxxxxxxx>; Ceraolo Spurio, Daniele
<daniele.ceraolospurio@xxxxxxxxx>; Upadhyay, Tejas
<tejas.upadhyay@xxxxxxxxx>; Tvrtko Ursulin <tursulin@xxxxxxxxxxx>; Joonas
Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>; Nikula, Jani
<jani.nikula@xxxxxxxxx>; Thomas Hellström
<thomas.hellstrom@xxxxxxxxxxxxxxx>; Teres Alexis, Alan Previn
<alan.previn.teres.alexis@xxxxxxxxx>; Winkler, Tomas
<tomas.winkler@xxxxxxxxx>; Usyskin, Alexander
<alexander.usyskin@xxxxxxxxx>
Subject: Re: [PATCH v2] drm: Ensure Proper Unload/Reload Order of MEI
Modules for i915/Xe Driver

On Tue, Sep 10, 2024 at 11:03:30AM GMT, Rodrigo Vivi wrote:
>On Mon, Sep 09, 2024 at 09:33:17AM +0530, Bommu Krishnaiah wrote:
>> This update addresses the unload/reload sequence of MEI modules in
>> relation to the i915/Xe graphics driver. On platforms where the MEI
>> hardware is integrated with the graphics device (e.g., DG2/BMG), the
>> i915/xe driver is depend on the MEI modules. Conversely, on newer
>> platforms like MTL and LNL, where the MEI hardware is separate, this
dependency does not exist.
>>
>> The changes introduced ensure that MEI modules are unloaded and
>> reloaded in the correct order based on platform-specific
>> dependencies. This is achieved by adding a MODULE_SOFTDEP directive to
the i915 and Xe module code.


can you explain what causes the modules to be loaded today? Also, is this to fix
anything related to *loading* order or just unload?

>>
>> These changes enhance the robustness of MEI module handling across
>> different hardware platforms, ensuring that the i915/Xe driver can be
>> cleanly unloaded and reloaded without issues.
>>
>> v2: updated commit message
>>
>> Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@xxxxxxxxx>
>> Cc: Kamil Konieczny <kamil.konieczny@xxxxxxxxxxxxxxx>
>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx>
>> Cc: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
>> Cc: Tejas Upadhyay <tejas.upadhyay@xxxxxxxxx>
>> ---
>>  drivers/gpu/drm/i915/i915_module.c | 2 ++
>>  drivers/gpu/drm/xe/xe_module.c     | 2 ++
>>  2 files changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_module.c
>> b/drivers/gpu/drm/i915/i915_module.c
>> index 65acd7bf75d0..2ad079ad35db 100644
>> --- a/drivers/gpu/drm/i915/i915_module.c
>> +++ b/drivers/gpu/drm/i915/i915_module.c
>> @@ -75,6 +75,8 @@ static const struct {  };  static int
>> init_progress;
>>
>> +MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");
>> +
>>  static int __init i915_init(void)
>>  {
>>  	int err, i;
>> diff --git a/drivers/gpu/drm/xe/xe_module.c
>> b/drivers/gpu/drm/xe/xe_module.c index bfc3deebdaa2..5633ea1841b7
>> 100644
>> --- a/drivers/gpu/drm/xe/xe_module.c
>> +++ b/drivers/gpu/drm/xe/xe_module.c
>> @@ -127,6 +127,8 @@ static void xe_call_exit_func(unsigned int i)
>>  	init_funcs[i].exit();
>>  }
>>
>> +MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");
>
>I'm honestly not very comfortable with this.
>
>1. This is not true for every device supported by these modules.
>2. This is not true for every (and the most basic) functionality of these drivers.
>
>Shouldn't this be done in the the mei side?

I don't think it's possible to do from the mei side. Would mei depend on both xe
and i915 (and thus cause both to be loaded regardless of the platform?). For a
runtime dependency like this that depends on the platform, I think the best way
would be a weakdep + either a request_module() or something else that causes
the module to load (is that what comp_* is doing today?)

>
>Couldn't at probe we identify the need of them and if needed we return
>-EPROBE to attempt a retry after the mei drivers were probed?

I'm not sure this is fixing anything for probe. I think we already wait on the other
component to be ready without blocking the rest of the driver functionality.

A weakdep wouldn't cause the module to be loaded where it's not needed, but
need some clarification if this is trying to fix anything load-related or just unload.

This change is fixing unload.
During xe load I am seeing mei_gsc modules was loaded, but not unloaded during the unload xe

so, first thing: if things are correct in the kernel, we shouldn't need to **unload** the module
after unbinding the device. Why are we unloading xe and the other
modules for tests?

root@DUT6127BMGFRD:/home/gta# lsmod | grep xe ------>>>just after system reboot
root@DUT6127BMGFRD:/home/gta#
root@DUT6127BMGFRD:/home/gta# lsmod | grep mei
mei_hdcp               28672  0
mei_pxp                16384  0
mei_me                 49152  2
mei                   167936  5 mei_hdcp,mei_pxp,mei_me
root@DUT6127BMGFRD:/home/gta# lsmod | grep xe
root@DUT6127BMGFRD:/home/gta#
root@DUT6127BMGFRD:/home/gta# modprobe xe
root@DUT6127BMGFRD:/home/gta#
root@DUT6127BMGFRD:/home/gta# lsmod | grep mei
mei_gsc_proxy          16384  0
mei_gsc                12288  1

			       ^ which means there's one user, which
			         should be xe

mei_hdcp               28672  0
mei_pxp                16384  0
mei_me                 49152  3 mei_gsc
mei                   167936  8 mei_gsc_proxy,mei_gsc,mei_hdcp,mei_pxp,mei_me
root@DUT6127BMGFRD:/home/gta#
root@DUT6127BMGFRD:/home/gta#
root@DUT6127BMGFRD:/home/gta#
root@DUT6127BMGFRD:/home/gta# init 3
root@DUT6127BMGFRD:/home/gta# echo -n auto > /sys/bus/pci/devices/0000\:03\:00.0/power/control
root@DUT6127BMGFRD:/home/gta# echo -n "0000:03:00.0" > /sys/bus/pci/drivers/xe/unbind
root@DUT6127BMGFRD:/home/gta# modprobe -r xe
root@DUT6127BMGFRD:/home/gta#
root@DUT6127BMGFRD:/home/gta# lsmod | grep xe
root@DUT6127BMGFRD:/home/gta# lsmod | grep mei
mei_gsc_proxy          16384  0
mei_gsc                12288  0

			       ^ great, so the refcount went to 0,
			         confirming it was xe. It should go to 0
				 even before you unload the module,
				 when unbind.

A couple of points:

1) why do we care about unloading mei_gsc. Just loading xe
   again (or even not even unloading it, just unbind/rebind),
   should still work if the xe <-> mei_gsc integration is done
   correctly.

2) If for some reason we do want to remove the module, then we will
   need some work in kernel/module/  to start tracking runtime module
   dependencies, i.e. when one module does a module_get(foo->owner), it
   would add to a list and output on sysfs together with the holders list.
   This way you would be able to track the runtime deps and remove them
   if their refcount went to 0 after removing xe.

(2) is doable, but previous attempts were not successful [1]. Is  there
something else to make the simpler solution (1) to work?

thanks
Lucas De Marchi

[1] https://lore.kernel.org/linux-modules/cover.1652113087.git.mchehab@xxxxxxxxxx/

mei_hdcp               28672  0
mei_pxp                16384  0
mei_me                 49152  3 mei_gsc
mei                   167936  7 mei_gsc_proxy,mei_gsc,mei_hdcp,mei_pxp,mei_me
root@DUT6127BMGFRD:/home/gta#

Regards,
Krishna.


Lucas De Marchi

>
>Cc: Alexander Usyskin <alexander.usyskin@xxxxxxxxx>
>Cc: Tomas Winkler <tomas.winkler@xxxxxxxxx>
>Cc: Alan Previn <alan.previn.teres.alexis@xxxxxxxxx>
>Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx>
>Cc: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
>Cc: Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx>
>Cc: Jani Nikula <jani.nikula@xxxxxxxxx>
>Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>
>Cc: Tvrtko Ursulin <tursulin@xxxxxxxxxxx>
>
>> +
>>  static int __init xe_init(void)
>>  {
>>  	int err, i;
>> --
>> 2.25.1
>>



[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux