Hi all, Ok top post on this firmware loading problem. First to note imo is that off the over 80 request_firmware calls in drm drivers only one (outside of i915) uses request_firmware_nowait. And none of the other have async firmware loading implemented with their own driver. That was kinda my background for all this and why I missed the entire design implications - I expected that the entirety of the firmware loading problem would boil down to one request_firmware() call and that's it. Obviously things are not quite as simple, so I started to dig around. Looking at the big drivers (nouveau and radeon/amdgpu) they only ever try to load the firmware once and if it's not there they either fail driver init outright (amd) or disable parts of it (nouveau can e.g. disable accelaration). None of them ever bothers with a 2nd attempt at firmware loading. I confirmed with systemd/udev developers that synchronous firmware loading is indeed how modern systems are expected to work. Which also implies that if you want to load a driver from the initrd, then the firmware must be in the initrd (which modern linux distros ensure). And if you build the driver into the kernel then the firmware also must be built-in. Delayed loading of firmware is considered broken userspace. The reason for this design switch is that with the delayed loading done years ago random timeouts when the firmware would never show up routinely showed up when booting. And on a general-purpose distro there's no way to know whether the firmware isn't loaded yet because userspace isn't ready or because it's simply not there. Hence the switch over to immediate failure and requiring the firmware to be present when the driver is initializing. So from that pov the only (and really the only thing) we need for enabling firmware loading on a normal linux distro is a synchronous request_firmware call in the driver load function somewhere. This might need some adjustements to the initrd builders in userspace, but that's all we need in the kernel. This is also all we need to enable internal testing, since as long as you don't ship the resulting image anywhere you can always built-in the firmware, which also resolves any "firmware isn't there yet" bugs. Now if you want delayed firmware loading there's still support for that, hidden behind the FW_LOADER_USER_HELPER_FALLBACK kernel config. Which because of the unfixable timeouts is disabled by default, and also no longer supported on the userspace side in modern distros. But the uevent handler is only 3 lines of bash, so not a big deal to wire up. Except that for hysterical raisins (one special snowflake driver needs it like that, and the patch author decided it's easier to have an inconsisten interface than fix up that driver) this userspace fallback stuff only works with the request_firmware interface, and _not_ with request_firmware_nowait. Despite that both run essentially the same code, _nowait simply wraps it up in an async work for convenience. That means to be able to use this we need our own async work item and use request_firmware in there. Converting the implicit async work to an explicit one is better anyway since it makes it more likely that patch authors and reviewers don't forget to correctly synchronize/flush with that async work. DMC loader is a perfect example of why implicit async callbacks are evil in kernel code. The problem now is that doing parts of gem init asynchronously is really scary, and we never had a proper design discussion about that. And unsurprisingly a lot of the review comments center around that. Otoh we don't want to block merging guc-based cmd submission just on that. Therefore my proposal is to rip out the current firmwareloader and async respectively re-loading code with all it's complications fromm the current patch series, and replace it with a simple&synchronous request_firmware. Also no fallbacks to keep everything simple. The actual guc enabling itself should carry over unchanged. With that we can unblock everyone (at least internally) and figure out a proper solutions for async gem init. To kickstart the design discussion for that a few general and gem-specific things I think are important: - Locking, async work and coordination should imo all be done explicitly in the code by default. Otherwise chances are way to high that critical issues get overlooked in the detailed code review. That means an explicit schedule_work and for synchronization explicit flush_work. No hiding them in frameworks and helper libraries. - dev->struct_mutex is scary and shouldn't ever be extended. New stuff needs new locking. Might not apply here for guc specifically, but just a general principle. - I'm absolutely scared off platform/feature differences in how the overall driver load sequence works. Which means if we need async gem init for guc, then we will do async gem init for everyone. That means improved test coverage for corner-cases and leass headaches to keep the unified driver afloat since there's only one way to do things. Of course this only applies to hw features actually shared between platforms, e.g. if we do async dmc loading that doesn't mean we need to do async loading of some other unrelated display feature. GuC is special here because cmd submission is the entire point of gem and hence has a big impact. - If we do async init at driver load time, then we'll also do the same on resume and gpu reset, for the same reasons as the cross-platform consistency. For GuC async gem init on gpu reset might not make that much sense, since gpu reset is pretty much just gem (re) init and it's already an async worker. But async gem init on resume definitely makes sense. - A common confusion I see with concurrent code is mixing up data consistency (resolved with locking) and handling synchronization and lifetimes (resolved with refcounting, completions and other synchronization design patterns). There's grey areas and exceptions, but I'm fairly picky on this topic. Also a big no-go is rolling your own synchronization - we have two of these in i915 (pageflip completion and reset handling) and both are still suspect or outright broken after years of banging heads against these walls. - For async gem init it's tempting to repurpose the gpu reset code with all the existing synchronization: We'd only need to set the RESET_IN_PROGRESS flag synchronously and then run a gpu reset after the firmware is loaded, all from the async worker. The problem with that is that we still have issues with rogue -EIO escaping into the modeset code while a reset is pending, and that would not be good since we want modesets to work right away ofc. Also most modeset paths need to stall for gpu resets synchronously, which again isn't what we want. - We should be able to reuse at least the locking scheme gpu reset relies on. And since gpu resets run gem_init_hw we should be able to run that asynchronously, with some care. - Beside synchronizing at open time like you're doing now, or in the guts of the lrc ctx sumbission code like Chris suggesteds the now merged anti-olr also gives us the possibility to sync at request creation. The problems I see with that is locking inversions between flush_work and dev->struct_mutex. But we could fix that with the ioctl restart handling we already have. But that has the same caveats as the -EIO handling - our modeset code doesn't cope well with -EAGAIN. For me the exact synchronization point with the async gem init work is the really big open here. Anyway that's my thoughts on firmware loading and all that, I hope this is useful as a basis to get this merged in an efficient way. Yours, Daniel On Mon, Jun 15, 2015 at 07:36:18PM +0100, Dave Gordon wrote: > This patch series enables command submission via the GuC. In this mode, > instead of the host CPU driving the execlist port directly, it hands > over work items to the GuC, using a doorbell mechanism to tell the GuC > that new items have been added to its work queue. The GuC then dispatches > contexts to the various GPU engines, and manages the resulting context- > switch interrupts. Completion of a batch is however still signalled to > the CPU; the GuC is not involved in handling user interrupts. > > There are three subsequences within the patch series: > > drm/i915: Add i915_gem_object_write() to i915_gem.c > drm/i915: Embedded microcontroller (uC) firmware loading support > > These first two patches provide a generic framework for fetching the > firmware that may be required by any embedded microcontroller from a > file, using an asynchronous thread so that driver initialisation can > continue while the firmware is being fetched. It is hoped that this > framework is sufficiently general that it can be used for all curent > and future microcontrollers. > > drm/i915: Add GuC-related module parameters > drm/i915: Add GuC-related header files > drm/i915: GuC-specific firmware loader > drm/i915: Debugfs interface to read GuC load status > > These four patches complete the GuC loader. At this point in the sequence > we can load and activate the GuC firmware, but not submit any batches > through it. (This is nonetheless a potentially useful state, as the GuC > can do other useful work even when not handling batch submissions). > > drm/i915: Defer default hardware context initialisation until first > drm/i915: Move execlists defines from .c to .h > drm/i915: GuC submission setup, phase 1 > drm/i915: Enable GuC firmware log > drm/i915: Implementation of GuC client > drm/i915: Interrupt routing for GuC submission > drm/i915: Integrate GuC-based command submission > drm/i915: Debugfs interface for GuC submission statistics > Documentation/drm: kerneldoc for GuC > drm/i915: Enable GuC submission, where supported > > In the final section, we implement the GuC submission mechanism, link > it into the (execlist-based) submission path, and finally enable it > (on supported platforms). On platforms where there is no GuC, or if > the GuC firmware cannot be found or is invalid, batch submission will > revert to using the execlist mechanism directly. > > The GuC firmware itself is not included in this patchset; it is or will > be available for download from https://01.org/linuxgraphics/downloads/ > This driver works with and requires GuC firmware revision 3.x. It will > not work with any firmware version 1.x, as the GuC protocol in those > revisions was incompatible and is no longer supported. > > Prerequisites: GuC submission will expose existing inadequacies in > some of the existing codepaths unless certain other patches are applied. > In particular we will require some version of Michel Thierry's patch > drm/i915/lrc: Update PDPx registers with lri commands > (because the GuC support light-restore, which execlist mode doesn't), > and my own > drm/i915: Allocate OLR more safely (workaround until OLR goes away) > because otherwise the changed timing means that there is an increased > risk of writing to a ringbuffer that is not currently pinned & mapped, > causing a kernel OOPS. > > Alex Dai (10): > drm/i915: Add i915_gem_object_write() to i915_gem.c > drm/i915: Add GuC-related module parameters > drm/i915: Add GuC-related header files > drm/i915: GuC-specific firmware loader > drm/i915: Debugfs interface to read GuC load status > drm/i915: GuC submission setup, phase 1 > drm/i915: Enable GuC firmware log > drm/i915: Implementation of GuC client > drm/i915: Integrate GuC-based command submission > Documentation/drm: kerneldoc for GuC > > Dave Gordon (5): > drm/i915: Embedded microcontroller (uC) firmware loading support > drm/i915: Defer default hardware context initialisation until first > drm/i915: Interrupt routing for GuC submission > drm/i915: Debugfs interface for GuC submission statistics > drm/i915: Enable GuC submission, where supported > > Michael H. Nguyen (1): > drm/i915: Move execlists defines from .c to .h > > Ben Widawsky > Vinit Azad > created the original versions on which some of these patches are based. > > Documentation/DocBook/drm.tmpl | 19 + > drivers/gpu/drm/i915/Makefile | 7 + > drivers/gpu/drm/i915/i915_debugfs.c | 109 +++- > drivers/gpu/drm/i915/i915_dma.c | 4 + > drivers/gpu/drm/i915/i915_drv.h | 17 + > drivers/gpu/drm/i915/i915_gem.c | 39 +- > drivers/gpu/drm/i915/i915_gem_context.c | 52 +- > drivers/gpu/drm/i915/i915_guc_submission.c | 873 ++++++++++++++++++++++++++++ > drivers/gpu/drm/i915/i915_irq.c | 48 ++ > drivers/gpu/drm/i915/i915_params.c | 9 + > drivers/gpu/drm/i915/i915_reg.h | 92 ++- > drivers/gpu/drm/i915/intel_guc.h | 184 ++++++ > drivers/gpu/drm/i915/intel_guc_api.h | 227 ++++++++ > drivers/gpu/drm/i915/intel_guc_loader.c | 498 ++++++++++++++++ > drivers/gpu/drm/i915/intel_lrc.c | 128 ++-- > drivers/gpu/drm/i915/intel_lrc.h | 8 + > drivers/gpu/drm/i915/intel_uc_loader.c | 312 ++++++++++ > drivers/gpu/drm/i915/intel_uc_loader.h | 82 +++ > 18 files changed, 2607 insertions(+), 101 deletions(-) > create mode 100644 drivers/gpu/drm/i915/i915_guc_submission.c > create mode 100644 drivers/gpu/drm/i915/intel_guc.h > create mode 100644 drivers/gpu/drm/i915/intel_guc_api.h > create mode 100644 drivers/gpu/drm/i915/intel_guc_loader.c > create mode 100644 drivers/gpu/drm/i915/intel_uc_loader.c > create mode 100644 drivers/gpu/drm/i915/intel_uc_loader.h > > -- > 1.7.9.5 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx