On Tue, Jul 20, 2021 at 01:13:54PM -0500, Jason Ekstrand wrote: > If the driver was not fully loaded, we may still have globals lying > around. If we don't tear those down in i915_exit(), we'll leak a bunch > of memory slabs. This can happen two ways: use_kms = false and if we've > run mock selftests. In either case, we have an early exit from > i915_init which happens after i915_globals_init() and we need to clean > up those globals. > > The mock selftests case is especially sticky. The load isn't entirely > a no-op. We actually do quite a bit inside those selftests including > allocating a bunch of mock objects and running tests on them. Once all > those tests are complete, we exit early from i915_init(). Perviously, > i915_init() would return a non-zero error code on failure and a zero > error code on success. In the success case, we would get to i915_exit() > and check i915_pci_driver.driver.owner to detect if i915_init exited early > and do nothing. In the failure case, we would fail i915_init() but > there would be no opportunity to clean up globals. > > The most annoying part is that you don't actually notice the failure as > part of the self-tests since leaking a bit of memory, while bad, doesn't > result in anything observable from userspace. Instead, the next time we > load the driver (usually for next IGT test), i915_globals_init() gets > invoked again, we go to allocate a bunch of new memory slabs, those > implicitly create debugfs entries, and debugfs warns that we're trying > to create directories and files that already exist. Since this all > happens as part of the next driver load, it shows up in the dmesg-warn > of whatever IGT test ran after the mock selftests. > > While the obvious thing to do here might be to call i915_globals_exit() > after selftests, that's not actually safe. The dma-buf selftests call > i915_gem_prime_export which creates a file. We call dma_buf_put() on > the resulting dmabuf which calls fput() on the file. However, fput() > isn't immediate and gets flushed right before syscall returns. This > means that all the fput()s from the selftests don't happen until right > before the module load syscall used to fire off the selftests returns > which is after i915_init(). If we call i915_globals_exit() in > i915_init() after selftests, we end up freeing slabs out from under > objects which won't get released until fput() is flushed at the end of > the module load syscall. > > The solution here is to let i915_init() return success early and detect > the early success in i915_exit() and only tear down globals and nothing > else. This way the module loads successfully, regardless of the success > or failure of the tests. Because we've not enumerated any PCI devices, > no device nodes are created and it's entirely useless from userspace. > The only thing the module does at that point is hold on to a bit of > memory until we unload it and i915_exit() is called. Importantly, this > means that everything from our selftests has the ability to properly > flush out between i915_init() and i915_exit() because there is at least > one syscall boundary in between. > > In order to handle all the delicate init/exit cases, we convert the > whole thing to a table of init/exit pairs and track the init status in > the new init_progress global. This allows us to ensure that i915_exit() > always tears down exactly the things that i915_init() successfully > initialized. We also allow early-exit of i915_init() without failure by > an init function returning > 0. This is useful for nomodeset, and > selftests. For the mock selftests, we convert them to always return 1 > so we get the desired behavior of the driver always succeeding to load > the driver and then properly tearing down the partially loaded driver. > > Signed-off-by: Jason Ekstrand <jason@xxxxxxxxxxxxxx> > Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> g.l.o.r.i.o.u.s. Reviewed-by: Daniel Vetter <daniel.vetter@xxxxxxxx> > --- > drivers/gpu/drm/i915/i915_pci.c | 104 ++++++++++++------ > drivers/gpu/drm/i915/i915_perf.c | 3 +- > drivers/gpu/drm/i915/i915_perf.h | 2 +- > drivers/gpu/drm/i915/i915_pmu.c | 4 +- > drivers/gpu/drm/i915/i915_pmu.h | 4 +- > .../gpu/drm/i915/selftests/i915_selftest.c | 2 +- > 6 files changed, 80 insertions(+), 39 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c > index 4e627b57d31a2..64ebd89eae6ce 100644 > --- a/drivers/gpu/drm/i915/i915_pci.c > +++ b/drivers/gpu/drm/i915/i915_pci.c > @@ -1185,27 +1185,9 @@ static void i915_pci_shutdown(struct pci_dev *pdev) > i915_driver_shutdown(i915); > } > > -static struct pci_driver i915_pci_driver = { > - .name = DRIVER_NAME, > - .id_table = pciidlist, > - .probe = i915_pci_probe, > - .remove = i915_pci_remove, > - .shutdown = i915_pci_shutdown, > - .driver.pm = &i915_pm_ops, > -}; > - > -static int __init i915_init(void) > +static int i915_check_nomodeset(void) > { > bool use_kms = true; > - int err; > - > - err = i915_globals_init(); > - if (err) > - return err; > - > - err = i915_mock_selftests(); > - if (err) > - return err > 0 ? 0 : err; > > /* > * Enable KMS by default, unless explicitly overriden by > @@ -1222,31 +1204,87 @@ static int __init i915_init(void) > if (!use_kms) { > /* Silently fail loading to not upset userspace. */ > DRM_DEBUG_DRIVER("KMS disabled.\n"); > - return 0; > + return 1; > } > > - i915_pmu_init(); > + return 0; > +} > > - err = pci_register_driver(&i915_pci_driver); > - if (err) { > - i915_pmu_exit(); > - i915_globals_exit(); > - return err; > +static struct pci_driver i915_pci_driver = { > + .name = DRIVER_NAME, > + .id_table = pciidlist, > + .probe = i915_pci_probe, > + .remove = i915_pci_remove, > + .shutdown = i915_pci_shutdown, > + .driver.pm = &i915_pm_ops, > +}; > + > +static int i915_register_pci_driver(void) > +{ > + return pci_register_driver(&i915_pci_driver); > +} > + > +static void i915_unregister_pci_driver(void) > +{ > + pci_unregister_driver(&i915_pci_driver); > +} > + > +static const struct { > + int (*init)(void); > + void (*exit)(void); > +} init_funcs[] = { > + { i915_globals_init, i915_globals_exit }, > + { i915_mock_selftests, NULL }, > + { i915_check_nomodeset, NULL }, > + { i915_pmu_init, i915_pmu_exit }, > + { i915_register_pci_driver, i915_unregister_pci_driver }, > + { i915_perf_sysctl_register, i915_perf_sysctl_unregister }, > +}; > +static int init_progress; > + > +static int __init i915_init(void) > +{ > + int err, i; > + > + for (i = 0; i < ARRAY_SIZE(init_funcs); i++) { > + err = init_funcs[i].init(); > + if (err < 0) { > + while (i--) { > + if (init_funcs[i].exit) > + init_funcs[i].exit(); > + } > + return err; > + } else if (err > 0) { > + /* > + * Early-exit success is reserved for things which > + * don't have an exit() function because we have no > + * idea how far they got or how to partially tear > + * them down. > + */ > + WARN_ON(init_funcs[i].exit); > + > + /* > + * We don't want to advertise devices with an only > + * partially initialized driver. > + */ > + WARN_ON(i915_pci_driver.driver.owner); > + break; > + } > } > > - i915_perf_sysctl_register(); > + init_progress = i; > + > return 0; > } > > static void __exit i915_exit(void) > { > - if (!i915_pci_driver.driver.owner) > - return; > + int i; > > - i915_perf_sysctl_unregister(); > - pci_unregister_driver(&i915_pci_driver); > - i915_pmu_exit(); > - i915_globals_exit(); > + for (i = init_progress - 1; i >= 0; i--) { > + if (init_funcs[i].exit) > + init_funcs[i].exit(); > + } > } > > module_init(i915_init); > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c > index b4ec114a4698b..48ddb363b3bda 100644 > --- a/drivers/gpu/drm/i915/i915_perf.c > +++ b/drivers/gpu/drm/i915/i915_perf.c > @@ -4483,9 +4483,10 @@ static int destroy_config(int id, void *p, void *data) > return 0; > } > > -void i915_perf_sysctl_register(void) > +int i915_perf_sysctl_register(void) > { > sysctl_header = register_sysctl_table(dev_root); > + return 0; > } > > void i915_perf_sysctl_unregister(void) > diff --git a/drivers/gpu/drm/i915/i915_perf.h b/drivers/gpu/drm/i915/i915_perf.h > index 882fdd0a76800..1d1329e5af3ae 100644 > --- a/drivers/gpu/drm/i915/i915_perf.h > +++ b/drivers/gpu/drm/i915/i915_perf.h > @@ -23,7 +23,7 @@ void i915_perf_fini(struct drm_i915_private *i915); > void i915_perf_register(struct drm_i915_private *i915); > void i915_perf_unregister(struct drm_i915_private *i915); > int i915_perf_ioctl_version(void); > -void i915_perf_sysctl_register(void); > +int i915_perf_sysctl_register(void); > void i915_perf_sysctl_unregister(void); > > int i915_perf_open_ioctl(struct drm_device *dev, void *data, > diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c > index 34d37d46a1262..eca92076f31d2 100644 > --- a/drivers/gpu/drm/i915/i915_pmu.c > +++ b/drivers/gpu/drm/i915/i915_pmu.c > @@ -1088,7 +1088,7 @@ static int i915_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node) > > static enum cpuhp_state cpuhp_slot = CPUHP_INVALID; > > -void i915_pmu_init(void) > +int i915_pmu_init(void) > { > int ret; > > @@ -1101,6 +1101,8 @@ void i915_pmu_init(void) > ret); > else > cpuhp_slot = ret; > + > + return 0; > } > > void i915_pmu_exit(void) > diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h > index 60f9595f902cd..449057648f39b 100644 > --- a/drivers/gpu/drm/i915/i915_pmu.h > +++ b/drivers/gpu/drm/i915/i915_pmu.h > @@ -147,14 +147,14 @@ struct i915_pmu { > }; > > #ifdef CONFIG_PERF_EVENTS > -void i915_pmu_init(void); > +int i915_pmu_init(void); > void i915_pmu_exit(void); > void i915_pmu_register(struct drm_i915_private *i915); > void i915_pmu_unregister(struct drm_i915_private *i915); > void i915_pmu_gt_parked(struct drm_i915_private *i915); > void i915_pmu_gt_unparked(struct drm_i915_private *i915); > #else > -static inline void i915_pmu_init(void) {} > +static inline int i915_pmu_init(void) { return 0; } > static inline void i915_pmu_exit(void) {} > static inline void i915_pmu_register(struct drm_i915_private *i915) {} > static inline void i915_pmu_unregister(struct drm_i915_private *i915) {} > diff --git a/drivers/gpu/drm/i915/selftests/i915_selftest.c b/drivers/gpu/drm/i915/selftests/i915_selftest.c > index 1bc11c09faef5..935d065725345 100644 > --- a/drivers/gpu/drm/i915/selftests/i915_selftest.c > +++ b/drivers/gpu/drm/i915/selftests/i915_selftest.c > @@ -187,7 +187,7 @@ int i915_mock_selftests(void) > err = run_selftests(mock, NULL); > if (err) { > i915_selftest.mock = err; > - return err; > + return 1; > } > > if (i915_selftest.mock < 0) { > -- > 2.31.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch