On 2025-02-12 at 16:35:16 GMT, Andi Shyti wrote: > Hi Krzysztof, > > On Wed, Feb 12, 2025 at 12:50:17PM +0100, Krzysztof Niemiec wrote: > > On 2025-02-10 at 14:01:19 GMT, Andi Shyti wrote: > > > On Thu, Feb 06, 2025 at 07:07:38PM +0100, Janusz Krzysztofik wrote: > > > > We return immediately from i915_driver_register() if drm_dev_register() > > > > fails, skipping remaining registration steps. However, the _unregister() > > > > counterpart called at device remove knows nothing about that skip and > > > > executes reverts for all those steps. For that to work correctly, those > > > > revert functions must be resistant to being called even on uninitialized > > > > objects, or we must not skip their initialization. > > > > > > > > Three cases have been identified and fixes proposed. Call traces are > > > > taken from CI results of igt@i915_driver_load@reload-with-fault-injection > > > > execution, reported to several separate Gitlab issues (links provided). > > > > > > > > Immediate return was introduced to i915_driver_register() by commit > > > > ec3e00b4ee27 ("drm/i915: stop registering if drm_dev_register() fails"), > > > > however, quite a few things have changed since then. That's why I haven't > > > > mentioned it in a Fixes: tag to avoid it being picked up by stable, which > > > > I haven't tested. > > > > > > I'm not fully convinced about this series as I think that you are > > > fixing a subset of what needs to be handled properly. What about > > > hwmon? What about gt? what about debugfs? > > > > > > In my opinion we need to check in _unregister whether the > > > drm_dev_register has succeded and one way would be, e.g., to > > > check for the drm minor value, or even set the drm device tu NULL > > > (first things that come to my mind, maybe there are smarter ways > > > of doing it). This way we could skip some of the _unregister() > > > steps. > > > > > > > Is there any situation in which having the driver loaded after failing > > drm_dev_register() is of any use? Because if not, instead of recording > > the fact of registration failure, we can just stop the driver from > > loading altogether by checking drm_dev_register()'s return value, > > then calling _only_ the required _unregister steps as cleanup in an > > error path, and propagating the result up to driver probe. This way we > > don't need to store any additional information at all. > > as long as the driver works, why pushing it to fail? Janusz's > patch is really showing the case. Because the driver doesn't really work... it is loaded into the kernel but you can't access it the expected way because we failed to register it to userspace, and we also skipped a bunch of registration functions. Is there really any benefit to keep it loaded in that state? Also, i915 is the *only* driver in the drm directory that doesn't check drm_dev_register()'s return value. All other drivers at least check it, and the ones I took a closer look at (xe, nouveau, amdgpu, radeon, virtio) propagate errors in drm_dev_register() up to their pci .probe functions. So I think we're safe to handle it this way, and this wouldn't force the driver to keep information on whether the registration succeeded or not (if _unregister ever gets called, it's because we're exiting from a properly loaded driver, so no functions are to be skipped) > > Andi Thanks Krzysztof