Re: [PATCH v2] drm: Ensure Proper Unload/Reload Order of MEI Modules for i915/Xe Driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 12, 2024 at 11:58:37AM GMT, Bommu, Krishnaiah wrote:


-----Original Message-----
From: De Marchi, Lucas <lucas.demarchi@xxxxxxxxx>
Sent: Wednesday, September 11, 2024 9:49 PM
To: Bommu, Krishnaiah <krishnaiah.bommu@xxxxxxxxx>
Cc: Vivi, Rodrigo <rodrigo.vivi@xxxxxxxxx>; intel-xe@xxxxxxxxxxxxxxxxxxxxx; intel-
gfx@xxxxxxxxxxxxxxxxxxxxx; Kamil Konieczny <kamil.konieczny@xxxxxxxxxxxxxxx>;
Ceraolo Spurio, Daniele <daniele.ceraolospurio@xxxxxxxxx>; Upadhyay, Tejas
<tejas.upadhyay@xxxxxxxxx>; Tvrtko Ursulin <tursulin@xxxxxxxxxxx>; Joonas
Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>; Nikula, Jani
<jani.nikula@xxxxxxxxx>; Thomas Hellström
<thomas.hellstrom@xxxxxxxxxxxxxxx>; Teres Alexis, Alan Previn
<alan.previn.teres.alexis@xxxxxxxxx>; Winkler, Tomas
<tomas.winkler@xxxxxxxxx>; Usyskin, Alexander
<alexander.usyskin@xxxxxxxxx>; linux-modules@xxxxxxxxxxxxxxx; Luis
Chamberlain <mcgrof@xxxxxxxxxx>
Subject: Re: [PATCH v2] drm: Ensure Proper Unload/Reload Order of MEI
Modules for i915/Xe Driver

+ linux-modules
+ Luis

On Wed, Sep 11, 2024 at 01:00:47AM GMT, Bommu, Krishnaiah wrote:
>
>
>> -----Original Message-----
>> From: De Marchi, Lucas <lucas.demarchi@xxxxxxxxx>
>> Sent: Tuesday, September 10, 2024 9:13 PM
>> To: Vivi, Rodrigo <rodrigo.vivi@xxxxxxxxx>
>> Cc: Bommu, Krishnaiah <krishnaiah.bommu@xxxxxxxxx>; intel-
>> xe@xxxxxxxxxxxxxxxxxxxxx; intel-gfx@xxxxxxxxxxxxxxxxxxxxx; Kamil
>> Konieczny <kamil.konieczny@xxxxxxxxxxxxxxx>; Ceraolo Spurio, Daniele
>> <daniele.ceraolospurio@xxxxxxxxx>; Upadhyay, Tejas
>> <tejas.upadhyay@xxxxxxxxx>; Tvrtko Ursulin <tursulin@xxxxxxxxxxx>;
>> Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>; Nikula, Jani
>> <jani.nikula@xxxxxxxxx>; Thomas Hellström
>> <thomas.hellstrom@xxxxxxxxxxxxxxx>; Teres Alexis, Alan Previn
>> <alan.previn.teres.alexis@xxxxxxxxx>; Winkler, Tomas
>> <tomas.winkler@xxxxxxxxx>; Usyskin, Alexander
>> <alexander.usyskin@xxxxxxxxx>
>> Subject: Re: [PATCH v2] drm: Ensure Proper Unload/Reload Order of MEI
>> Modules for i915/Xe Driver
>>
>> On Tue, Sep 10, 2024 at 11:03:30AM GMT, Rodrigo Vivi wrote:
>> >On Mon, Sep 09, 2024 at 09:33:17AM +0530, Bommu Krishnaiah wrote:
>> >> This update addresses the unload/reload sequence of MEI modules in
>> >> relation to the i915/Xe graphics driver. On platforms where the
>> >> MEI hardware is integrated with the graphics device (e.g.,
>> >> DG2/BMG), the i915/xe driver is depend on the MEI modules.
>> >> Conversely, on newer platforms like MTL and LNL, where the MEI
>> >> hardware is separate, this
>> dependency does not exist.
>> >>
>> >> The changes introduced ensure that MEI modules are unloaded and
>> >> reloaded in the correct order based on platform-specific
>> >> dependencies. This is achieved by adding a MODULE_SOFTDEP
>> >> directive to
>> the i915 and Xe module code.
>>
>>
>> can you explain what causes the modules to be loaded today? Also, is
>> this to fix anything related to *loading* order or just unload?
>>
>> >>
>> >> These changes enhance the robustness of MEI module handling across
>> >> different hardware platforms, ensuring that the i915/Xe driver can
>> >> be cleanly unloaded and reloaded without issues.
>> >>
>> >> v2: updated commit message
>> >>
>> >> Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@xxxxxxxxx>
>> >> Cc: Kamil Konieczny <kamil.konieczny@xxxxxxxxxxxxxxx>
>> >> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx>
>> >> Cc: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
>> >> Cc: Tejas Upadhyay <tejas.upadhyay@xxxxxxxxx>
>> >> ---
>> >>  drivers/gpu/drm/i915/i915_module.c | 2 ++
>> >>  drivers/gpu/drm/xe/xe_module.c     | 2 ++
>> >>  2 files changed, 4 insertions(+)
>> >>
>> >> diff --git a/drivers/gpu/drm/i915/i915_module.c
>> >> b/drivers/gpu/drm/i915/i915_module.c
>> >> index 65acd7bf75d0..2ad079ad35db 100644
>> >> --- a/drivers/gpu/drm/i915/i915_module.c
>> >> +++ b/drivers/gpu/drm/i915/i915_module.c
>> >> @@ -75,6 +75,8 @@ static const struct {  };  static int
>> >> init_progress;
>> >>
>> >> +MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");
>> >> +
>> >>  static int __init i915_init(void)  {
>> >>  	int err, i;
>> >> diff --git a/drivers/gpu/drm/xe/xe_module.c
>> >> b/drivers/gpu/drm/xe/xe_module.c index bfc3deebdaa2..5633ea1841b7
>> >> 100644
>> >> --- a/drivers/gpu/drm/xe/xe_module.c
>> >> +++ b/drivers/gpu/drm/xe/xe_module.c
>> >> @@ -127,6 +127,8 @@ static void xe_call_exit_func(unsigned int i)
>> >>  	init_funcs[i].exit();
>> >>  }
>> >>
>> >> +MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");
>> >
>> >I'm honestly not very comfortable with this.
>> >
>> >1. This is not true for every device supported by these modules.
>> >2. This is not true for every (and the most basic) functionality of these
drivers.
>> >
>> >Shouldn't this be done in the the mei side?
>>
>> I don't think it's possible to do from the mei side. Would mei depend
>> on both xe and i915 (and thus cause both to be loaded regardless of
>> the platform?). For a runtime dependency like this that depends on
>> the platform, I think the best way would be a weakdep + either a
>> request_module() or something else that causes the module to load (is
>> that what comp_* is doing today?)
>>
>> >
>> >Couldn't at probe we identify the need of them and if needed we
>> >return -EPROBE to attempt a retry after the mei drivers were probed?
>>
>> I'm not sure this is fixing anything for probe. I think we already
>> wait on the other component to be ready without blocking the rest of the
driver functionality.
>>
>> A weakdep wouldn't cause the module to be loaded where it's not
>> needed, but need some clarification if this is trying to fix anything load-
related or just unload.
>
>This change is fixing unload.
>During xe load I am seeing mei_gsc modules was loaded, but not unloaded
>during the unload xe

so, first thing: if things are correct in the kernel, we shouldn't need to
**unload** the module after unbinding the device. Why are we unloading xe
and the other modules for tests?

While running gta@xe_module_load@reload-no-display I see failure, to address this failure I have this changes, previously I am trying to fix from IGT, but as per igt review suggestion I am trying to fix issue in kernel,
IGT patch: https://patchwork.freedesktop.org/series/137343/


it seems a mistake in igt to try to remove the mei_gsc module.
As a dgfx, it's even worse - what happens if another card is using the
module? What happens if I have a RPL + BMG and i915 driving the former
while xe drives the latter?

You shouldn't need to remove it.  This works for me with BMG (unbinding
all drivers for simplicity since we are removing the module... but if we
don't remove the module, then we can test with the only device we care
about):


# modprobe xe
# unbind
Unbinding /sys/bus/pci/devices/0000:00:02.0 (8086:a782)... ok
Unbinding /sys/bus/pci/devices/0000:03:00.0 (8086:e20b)... ok
# lsmod | grep -e xe -e mei_gsc
xe                   3584000  0
drm_gpuvm              45056  1 xe
video                  77824  1 xe
i2c_algo_bit           12288  1 xe
drm_ttm_helper         16384  1 xe
gpu_sched              61440  1 xe
drm_suballoc_helper    16384  1 xe
drm_display_helper    270336  1 xe
drm_kunit_helpers      16384  1 xe
drm_buddy              20480  1 xe
ttm                   114688  2 drm_ttm_helper,xe
mei_gsc_proxy          16384  0
mei_gsc                12288  0
drm_exec               16384  2 drm_gpuvm,xe
kunit                  73728  2 xe,drm_kunit_helpers
drm_kms_helper        241664  4 drm_display_helper,drm_ttm_helper,xe,drm_kunit_helpers
mei_me                 65536  3 mei_gsc
mei                   167936  7 mei_gsc_proxy,mei_gsc,mei_hdcp,mei_pxp,mei_me
drm                   737280  11 gpu_sched,drm_kms_helper,drm_exec,drm_gpuvm,drm_suballoc_helper,drm_display_helper,drm_buddy,drm_ttm_helper,xe,drm_kunit_helpers,ttm
# modprobe -r xe
# modprobe xe probe_display=0
# unbind
Unbinding /sys/bus/pci/devices/0000:00:02.0 (8086:a782)... ok
Unbinding /sys/bus/pci/devices/0000:03:00.0 (8086:e20b)... ok
# modprobe -r xe
# modprobe xe

I didn't check if mei_gsc continues to work after reload, but I guess so
as its refcount is incremented:

mei_gsc                12288  1


unbind function is this:

function unbind {
        vga="0300"
        display="0380"
        pci_vendor="8086"

        while read -r pci_slot class devid xxx; do
                sysdev=/sys/bus/pci/devices/0000:$pci_slot

                echo -n "Unbinding $sysdev ($devid)... "
                if [ ! -e "$sysdev/driver" ]; then
                        echo "(skip: not bound)"
                        continue
                fi

                echo -n auto > ${sysdev}/power/control
                echo -n "0000:$pci_slot" > $sysdev/driver/unbind
                echo "ok"
        done <<<$(lspci -d ${pci_vendor}::${display} -n; lspci -d ${pci_vendor}::${vga} -n )
}


So... for igt: I *think* simply removing the array with modules to
unload first would fix it.

Lucas De Marchi


>root@DUT6127BMGFRD:/home/gta# lsmod | grep xe ------>>>just after
>system reboot root@DUT6127BMGFRD:/home/gta#
>root@DUT6127BMGFRD:/home/gta# lsmod | grep mei
>mei_hdcp               28672  0
>mei_pxp                16384  0
>mei_me                 49152  2
>mei                   167936  5 mei_hdcp,mei_pxp,mei_me
>root@DUT6127BMGFRD:/home/gta# lsmod | grep xe
>root@DUT6127BMGFRD:/home/gta# root@DUT6127BMGFRD:/home/gta#
modprobe xe
>root@DUT6127BMGFRD:/home/gta# root@DUT6127BMGFRD:/home/gta#
lsmod |
>grep mei
>mei_gsc_proxy          16384  0
>mei_gsc                12288  1

			       ^ which means there's one user, which
			         should be xe

>mei_hdcp               28672  0
>mei_pxp                16384  0
>mei_me                 49152  3 mei_gsc
>mei                   167936  8 mei_gsc_proxy,mei_gsc,mei_hdcp,mei_pxp,mei_me
>root@DUT6127BMGFRD:/home/gta#
>root@DUT6127BMGFRD:/home/gta#
>root@DUT6127BMGFRD:/home/gta#
>root@DUT6127BMGFRD:/home/gta# init 3
>root@DUT6127BMGFRD:/home/gta# echo -n auto >
>/sys/bus/pci/devices/0000\:03\:00.0/power/control
>root@DUT6127BMGFRD:/home/gta# echo -n "0000:03:00.0" >
>/sys/bus/pci/drivers/xe/unbind root@DUT6127BMGFRD:/home/gta#
modprobe
>-r xe root@DUT6127BMGFRD:/home/gta#
root@DUT6127BMGFRD:/home/gta# lsmod
>| grep xe root@DUT6127BMGFRD:/home/gta# lsmod | grep mei
>mei_gsc_proxy          16384  0
>mei_gsc                12288  0

			       ^ great, so the refcount went to 0,
			         confirming it was xe. It should go to 0
				 even before you unload the module,
				 when unbind.

A couple of points:

1) why do we care about unloading mei_gsc. Just loading xe
    again (or even not even unloading it, just unbind/rebind),
    should still work if the xe <-> mei_gsc integration is done
    correctly.

2) If for some reason we do want to remove the module, then we will
    need some work in kernel/module/  to start tracking runtime module
    dependencies, i.e. when one module does a module_get(foo->owner), it
    would add to a list and output on sysfs together with the holders list.
    This way you would be able to track the runtime deps and remove them
    if their refcount went to 0 after removing xe.

(2) is doable, but previous attempts were not successful [1]. Is  there something
else to make the simpler solution (1) to work?


Reference why I am doing this changes, please see review comments of this patch https://patchwork.freedesktop.org/series/137343/

Regards,
Krishna.

thanks
Lucas De Marchi

[1] https://lore.kernel.org/linux-
modules/cover.1652113087.git.mchehab@xxxxxxxxxx/

>mei_hdcp               28672  0
>mei_pxp                16384  0
>mei_me                 49152  3 mei_gsc
>mei                   167936  7 mei_gsc_proxy,mei_gsc,mei_hdcp,mei_pxp,mei_me
>root@DUT6127BMGFRD:/home/gta#
>
>Regards,
>Krishna.
>
>>
>> Lucas De Marchi
>>
>> >
>> >Cc: Alexander Usyskin <alexander.usyskin@xxxxxxxxx>
>> >Cc: Tomas Winkler <tomas.winkler@xxxxxxxxx>
>> >Cc: Alan Previn <alan.previn.teres.alexis@xxxxxxxxx>
>> >Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx>
>> >Cc: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
>> >Cc: Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx>
>> >Cc: Jani Nikula <jani.nikula@xxxxxxxxx>
>> >Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>
>> >Cc: Tvrtko Ursulin <tursulin@xxxxxxxxxxx>
>> >
>> >> +
>> >>  static int __init xe_init(void)
>> >>  {
>> >>  	int err, i;
>> >> --
>> >> 2.25.1
>> >>



[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux