+ linux-modules
+ Luis
On Wed, Sep 11, 2024 at 01:00:47AM GMT, Bommu, Krishnaiah wrote:
-----Original Message-----
From: De Marchi, Lucas <lucas.demarchi@xxxxxxxxx>
Sent: Tuesday, September 10, 2024 9:13 PM
To: Vivi, Rodrigo <rodrigo.vivi@xxxxxxxxx>
Cc: Bommu, Krishnaiah <krishnaiah.bommu@xxxxxxxxx>; intel-
xe@xxxxxxxxxxxxxxxxxxxxx; intel-gfx@xxxxxxxxxxxxxxxxxxxxx; Kamil Konieczny
<kamil.konieczny@xxxxxxxxxxxxxxx>; Ceraolo Spurio, Daniele
<daniele.ceraolospurio@xxxxxxxxx>; Upadhyay, Tejas
<tejas.upadhyay@xxxxxxxxx>; Tvrtko Ursulin <tursulin@xxxxxxxxxxx>; Joonas
Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>; Nikula, Jani
<jani.nikula@xxxxxxxxx>; Thomas Hellström
<thomas.hellstrom@xxxxxxxxxxxxxxx>; Teres Alexis, Alan Previn
<alan.previn.teres.alexis@xxxxxxxxx>; Winkler, Tomas
<tomas.winkler@xxxxxxxxx>; Usyskin, Alexander
<alexander.usyskin@xxxxxxxxx>
Subject: Re: [PATCH v2] drm: Ensure Proper Unload/Reload Order of MEI
Modules for i915/Xe Driver
On Tue, Sep 10, 2024 at 11:03:30AM GMT, Rodrigo Vivi wrote:
>On Mon, Sep 09, 2024 at 09:33:17AM +0530, Bommu Krishnaiah wrote:
>> This update addresses the unload/reload sequence of MEI modules in
>> relation to the i915/Xe graphics driver. On platforms where the MEI
>> hardware is integrated with the graphics device (e.g., DG2/BMG), the
>> i915/xe driver is depend on the MEI modules. Conversely, on newer
>> platforms like MTL and LNL, where the MEI hardware is separate, this
dependency does not exist.
>>
>> The changes introduced ensure that MEI modules are unloaded and
>> reloaded in the correct order based on platform-specific
>> dependencies. This is achieved by adding a MODULE_SOFTDEP directive to
the i915 and Xe module code.
can you explain what causes the modules to be loaded today? Also, is this to fix
anything related to *loading* order or just unload?
>>
>> These changes enhance the robustness of MEI module handling across
>> different hardware platforms, ensuring that the i915/Xe driver can be
>> cleanly unloaded and reloaded without issues.
>>
>> v2: updated commit message
>>
>> Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@xxxxxxxxx>
>> Cc: Kamil Konieczny <kamil.konieczny@xxxxxxxxxxxxxxx>
>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx>
>> Cc: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
>> Cc: Tejas Upadhyay <tejas.upadhyay@xxxxxxxxx>
>> ---
>> drivers/gpu/drm/i915/i915_module.c | 2 ++
>> drivers/gpu/drm/xe/xe_module.c | 2 ++
>> 2 files changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_module.c
>> b/drivers/gpu/drm/i915/i915_module.c
>> index 65acd7bf75d0..2ad079ad35db 100644
>> --- a/drivers/gpu/drm/i915/i915_module.c
>> +++ b/drivers/gpu/drm/i915/i915_module.c
>> @@ -75,6 +75,8 @@ static const struct { }; static int
>> init_progress;
>>
>> +MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");
>> +
>> static int __init i915_init(void)
>> {
>> int err, i;
>> diff --git a/drivers/gpu/drm/xe/xe_module.c
>> b/drivers/gpu/drm/xe/xe_module.c index bfc3deebdaa2..5633ea1841b7
>> 100644
>> --- a/drivers/gpu/drm/xe/xe_module.c
>> +++ b/drivers/gpu/drm/xe/xe_module.c
>> @@ -127,6 +127,8 @@ static void xe_call_exit_func(unsigned int i)
>> init_funcs[i].exit();
>> }
>>
>> +MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");
>
>I'm honestly not very comfortable with this.
>
>1. This is not true for every device supported by these modules.
>2. This is not true for every (and the most basic) functionality of these drivers.
>
>Shouldn't this be done in the the mei side?
I don't think it's possible to do from the mei side. Would mei depend on both xe
and i915 (and thus cause both to be loaded regardless of the platform?). For a
runtime dependency like this that depends on the platform, I think the best way
would be a weakdep + either a request_module() or something else that causes
the module to load (is that what comp_* is doing today?)
>
>Couldn't at probe we identify the need of them and if needed we return
>-EPROBE to attempt a retry after the mei drivers were probed?
I'm not sure this is fixing anything for probe. I think we already wait on the other
component to be ready without blocking the rest of the driver functionality.
A weakdep wouldn't cause the module to be loaded where it's not needed, but
need some clarification if this is trying to fix anything load-related or just unload.
This change is fixing unload.
During xe load I am seeing mei_gsc modules was loaded, but not unloaded during the unload xe
so, first thing: if things are correct in the kernel, we shouldn't need to **unload** the module
after unbinding the device. Why are we unloading xe and the other
modules for tests?
root@DUT6127BMGFRD:/home/gta# lsmod | grep xe ------>>>just after system reboot
root@DUT6127BMGFRD:/home/gta#
root@DUT6127BMGFRD:/home/gta# lsmod | grep mei
mei_hdcp 28672 0
mei_pxp 16384 0
mei_me 49152 2
mei 167936 5 mei_hdcp,mei_pxp,mei_me
root@DUT6127BMGFRD:/home/gta# lsmod | grep xe
root@DUT6127BMGFRD:/home/gta#
root@DUT6127BMGFRD:/home/gta# modprobe xe
root@DUT6127BMGFRD:/home/gta#
root@DUT6127BMGFRD:/home/gta# lsmod | grep mei
mei_gsc_proxy 16384 0
mei_gsc 12288 1
^ which means there's one user, which
should be xe
mei_hdcp 28672 0
mei_pxp 16384 0
mei_me 49152 3 mei_gsc
mei 167936 8 mei_gsc_proxy,mei_gsc,mei_hdcp,mei_pxp,mei_me
root@DUT6127BMGFRD:/home/gta#
root@DUT6127BMGFRD:/home/gta#
root@DUT6127BMGFRD:/home/gta#
root@DUT6127BMGFRD:/home/gta# init 3
root@DUT6127BMGFRD:/home/gta# echo -n auto > /sys/bus/pci/devices/0000\:03\:00.0/power/control
root@DUT6127BMGFRD:/home/gta# echo -n "0000:03:00.0" > /sys/bus/pci/drivers/xe/unbind
root@DUT6127BMGFRD:/home/gta# modprobe -r xe
root@DUT6127BMGFRD:/home/gta#
root@DUT6127BMGFRD:/home/gta# lsmod | grep xe
root@DUT6127BMGFRD:/home/gta# lsmod | grep mei
mei_gsc_proxy 16384 0
mei_gsc 12288 0
^ great, so the refcount went to 0,
confirming it was xe. It should go to 0
even before you unload the module,
when unbind.
A couple of points:
1) why do we care about unloading mei_gsc. Just loading xe
again (or even not even unloading it, just unbind/rebind),
should still work if the xe <-> mei_gsc integration is done
correctly.
2) If for some reason we do want to remove the module, then we will
need some work in kernel/module/ to start tracking runtime module
dependencies, i.e. when one module does a module_get(foo->owner), it
would add to a list and output on sysfs together with the holders list.
This way you would be able to track the runtime deps and remove them
if their refcount went to 0 after removing xe.
(2) is doable, but previous attempts were not successful [1]. Is there
something else to make the simpler solution (1) to work?
thanks
Lucas De Marchi
[1] https://lore.kernel.org/linux-modules/cover.1652113087.git.mchehab@xxxxxxxxxx/
mei_hdcp 28672 0
mei_pxp 16384 0
mei_me 49152 3 mei_gsc
mei 167936 7 mei_gsc_proxy,mei_gsc,mei_hdcp,mei_pxp,mei_me
root@DUT6127BMGFRD:/home/gta#
Regards,
Krishna.
Lucas De Marchi
>
>Cc: Alexander Usyskin <alexander.usyskin@xxxxxxxxx>
>Cc: Tomas Winkler <tomas.winkler@xxxxxxxxx>
>Cc: Alan Previn <alan.previn.teres.alexis@xxxxxxxxx>
>Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx>
>Cc: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
>Cc: Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx>
>Cc: Jani Nikula <jani.nikula@xxxxxxxxx>
>Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>
>Cc: Tvrtko Ursulin <tursulin@xxxxxxxxxxx>
>
>> +
>> static int __init xe_init(void)
>> {
>> int err, i;
>> --
>> 2.25.1
>>