On Sat, Jul 17, 2021 at 12:48 AM Jason Ekstrand <jason@xxxxxxxxxxxxxx> wrote: > In i915_exit(), we check i915_pci_driver.driver.owner to detect if > i915_init exited early and don't tear anything down. However, we didn't > have proper tear-down paths for early exits in i915_init(). > > Most of the time, you would never notice this as driver init failures > are extremely rare and generally the sign of a bigger bug. However, > when the mock self-tests are run, they run as part of i915_init() and > exit early once they complete. They run after i915_globals_init() and > before we set up anything else. The IGT test then unloads the module, > invoking i915_exit() which, thanks to our i915_pci_driver.driver.owner > check, doesn't actually tear anything down. Importantly, this means > i915_globals_exit() never gets called even though i915_globals_init() > was and we leak the globals. > > The most annoying part is that you don't actually notice the failure as > part of the self-tests since leaking a bit of memory, while bad, doesn't > result in anything observable from userspace. Instead, the next time we > load the driver (usually for next IGT test), i915_globals_init() gets > invoked again, we go to allocate a bunch of new memory slabs, those > implicitly create debugfs entries, and debugfs warns that we're trying > to create directories and files that already exist. Since this all > happens as part of the next driver load, it shows up in the dmesg-warn > of whatever IGT test ran after the mock selftests. My idea was to onion-unwind in i915_exit, but that means we need to carry state over or have checks for every step, which is a bit annoying. Yours unwinds even if i915_init returns 0, i.e. success, if we had some selftests, which is most unusual and I think deserves an explainer here in the commit message and maybe somewhere in the code. > Signed-off-by: Jason Ekstrand <jason@xxxxxxxxxxxxxx> > Fixes: 32eb6bcfdda9 ("drm/i915: Make request allocation caches global") > Cc: Daniel Vetter <daniel@xxxxxxxx> > --- > drivers/gpu/drm/i915/i915_globals.c | 4 ++-- > drivers/gpu/drm/i915/i915_pci.c | 23 +++++++++++++++++------ > 2 files changed, 19 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_globals.c b/drivers/gpu/drm/i915/i915_globals.c > index 77f1911c463b8..87267e1d2ad92 100644 > --- a/drivers/gpu/drm/i915/i915_globals.c > +++ b/drivers/gpu/drm/i915/i915_globals.c > @@ -138,7 +138,7 @@ void i915_globals_unpark(void) > atomic_inc(&active); > } > > -static void __exit __i915_globals_flush(void) > +static void __i915_globals_flush(void) > { > atomic_inc(&active); /* skip shrinking */ > > @@ -148,7 +148,7 @@ static void __exit __i915_globals_flush(void) > atomic_dec(&active); > } > > -void __exit i915_globals_exit(void) > +void i915_globals_exit(void) > { > GEM_BUG_ON(atomic_read(&active)); > > diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c > index 50ed93b03e582..783f547be0990 100644 > --- a/drivers/gpu/drm/i915/i915_pci.c > +++ b/drivers/gpu/drm/i915/i915_pci.c > @@ -1199,13 +1199,20 @@ static int __init i915_init(void) > bool use_kms = true; > int err; > > + /* We use this to detect early returns from i915_init() so we don't > + * tear anything down in i915_exit() > + */ > + i915_pci_driver.driver.owner = NULL; Setting this seems redundant? Or if you want to make it explicit, just have a dedicated bool with a big comment explaining that only when we load the full pci driver do we tear down stuff in i915_exit. You could then set after pci_register_driver was successful. Some screaming name like driver_fully_loaded or something like that ... > + > err = i915_globals_init(); > if (err) > return err; > > err = i915_mock_selftests(); > - if (err) > - return err > 0 ? 0 : err; > + if (err) { > + err = err > 0 ? 0 : err; > + goto globals_exit; > + } > > /* > * Enable KMS by default, unless explicitly overriden by Imo move this up, but if you want I can send out my diff so you score an r-b: tag :-) > @@ -1228,13 +1235,17 @@ static int __init i915_init(void) > i915_pmu_init(); > > err = pci_register_driver(&i915_pci_driver); > - if (err) { > - i915_pmu_exit(); > - return err; > - } > + if (err) > + goto pmu_exit; > > i915_perf_sysctl_register(); > return 0; > + We unwind even on success, which is most unusual. I think that deserves a comment. > +pmu_exit: > + i915_pmu_exit(); > +globals_exit: > + i915_globals_exit(); > + return err; > } > > static void __exit i915_exit(void) > -- > 2.31.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch