Hi Krzysztof, On Wed, Feb 12, 2025 at 12:50:17PM +0100, Krzysztof Niemiec wrote: > On 2025-02-10 at 14:01:19 GMT, Andi Shyti wrote: > > On Thu, Feb 06, 2025 at 07:07:38PM +0100, Janusz Krzysztofik wrote: > > > We return immediately from i915_driver_register() if drm_dev_register() > > > fails, skipping remaining registration steps. However, the _unregister() > > > counterpart called at device remove knows nothing about that skip and > > > executes reverts for all those steps. For that to work correctly, those > > > revert functions must be resistant to being called even on uninitialized > > > objects, or we must not skip their initialization. > > > > > > Three cases have been identified and fixes proposed. Call traces are > > > taken from CI results of igt@i915_driver_load@reload-with-fault-injection > > > execution, reported to several separate Gitlab issues (links provided). > > > > > > Immediate return was introduced to i915_driver_register() by commit > > > ec3e00b4ee27 ("drm/i915: stop registering if drm_dev_register() fails"), > > > however, quite a few things have changed since then. That's why I haven't > > > mentioned it in a Fixes: tag to avoid it being picked up by stable, which > > > I haven't tested. > > > > I'm not fully convinced about this series as I think that you are > > fixing a subset of what needs to be handled properly. What about > > hwmon? What about gt? what about debugfs? > > > > In my opinion we need to check in _unregister whether the > > drm_dev_register has succeded and one way would be, e.g., to > > check for the drm minor value, or even set the drm device tu NULL > > (first things that come to my mind, maybe there are smarter ways > > of doing it). This way we could skip some of the _unregister() > > steps. > > > > Is there any situation in which having the driver loaded after failing > drm_dev_register() is of any use? Because if not, instead of recording > the fact of registration failure, we can just stop the driver from > loading altogether by checking drm_dev_register()'s return value, > then calling _only_ the required _unregister steps as cleanup in an > error path, and propagating the result up to driver probe. This way we > don't need to store any additional information at all. as long as the driver works, why pushing it to fail? Janusz's patch is really showing the case. Andi