Re: [PATCH 02/15] drm/i915: Embedded microcontroller (uC) firmware loading support

Daniel Vetter <daniel@xxxxxxxx> · Mon, 6 Jul 2015 15:24:52 +0200

On Mon, Jul 06, 2015 at 01:44:10PM +0100, Dave Gordon wrote:
> On 24/06/15 11:29, Daniel Vetter wrote:
> >On Fri, Jun 19, 2015 at 09:43:11AM +0100, Dave Gordon wrote:
> >>On 18/06/15 15:49, Daniel Vetter wrote:
> >>>On Thu, Jun 18, 2015 at 01:11:34PM +0100, Dave Gordon wrote:
> >>>>On 17/06/15 13:05, Daniel Vetter wrote:
> >>>>>On Mon, Jun 15, 2015 at 07:36:20PM +0100, Dave Gordon wrote:
> >>>>>>Current devices may contain one or more programmable microcontrollers
> >>>>>>that need to have a firmware image (aka "binary blob") loaded from an
> >>>>>>external medium and transferred to the device's memory.
> >>>>>>
> >>>>>>This file provides generic support functions for doing this; they can
> >>>>>>then be used by each uC-specific loader, thus reducing code duplication
> >>>>>>and testing effort.
> >>>>>>
> >>>>>>Signed-off-by: Dave Gordon <david.s.gordon@xxxxxxxxx>
> >>>>>>Signed-off-by: Alex Dai <yu.dai@xxxxxxxxx>
> >>>>>
> >>>>>Given that I'm just shredding the synchronization used by the dmc loader
> >>>>>I'm not convinced this is a good idea. Abstraction has cost, and a bit of
> >>>>>copy-paste for similar sounding but slightly different things doesn't
> >>>>>sound awful to me. And the critical bit in all the firmware loading I've
> >>>>>seen thus far is in synchronizing the loading with other operations,
> >>>>>hiding that isn't a good idea. Worse if we enforce stuff like requiring
> >>>>>dev->struct_mutex.
> >>>>>-Daniel
> >>>>
> >>>>It's precisely because it's in some sense "trivial-but-tricky" that we
> >>>>should write it once, get it right, and use it everywhere. Copypaste
> >>>>/does/ sound awful; I've seen how the code this was derived from had
> >>>>already been cloned into three flavours, all different and all wrong.
> >>>>
> >>>>It's a very simple abstraction: one early call to kick things off as
> >>>>early as possible, no locking required. One late call with the
> >>>>struct_mutex held to complete the synchronisation and actually do the
> >>>>work, thus guaranteeing that the transfer to the target uC is done in a
> >>>>controlled fashion, at a time of the caller's choice, and by the
> >>>>driver's mainline thread, NOT by an asynchronous thread racing with
> >>>>other activity (which was one of the things wrong with the original
> >>>>version).
> >>>
> >>>Yeah I've seen the origins of this in the display code, and that code gets
> >>>the syncing wrong. The only thing that one has do to is grab a runtime pm
> >>>reference for the appropriate power well to prevent dc5 entry, and release
> >>>it when the firmware is loaded and initialized.
> >>
> >>Agreed.
> >>
> >>>Which means any kind of firmware loader which requires/uses
> >>>dev->struct_mutex get stuff wrong and is not appropriate everywhere.
> >>
> >>BUT, the loading of the firmware into any uC MUST be done in a
> >>controlled manner i.e. at a time when no other thread is touching the
> >>h/w. Otherwise the f/w load and whatever else is concurrently accessing
> >>the h/w could in some cases interfere disastrously. Examples of
> >>interference might be:
> >>
> >>* interleaved accesses to the ELSP (in the case of the GuC)
> >>* incorrect handover of power management (DMC, GuC)
> >>* erroneous management of forcewake state
> >>
> >>In general the f/w that is just starting on the uC may have certain
> >>expectations about the initial state of the h/w, which may not be met if
> >>other threads are accessing various bits of h/w while the uC is booting up.
> >>
> >>So we absolutely need to guarantee that the f/w load is done by a thread
> >>which has exclusive ownership of any bit of the h/w that the f/w is
> >>going to make assumptions about. With the current locking structure of
> >>the driver, that means holding the struct_mutex (it shouldn't really,
> >>there should be a separate mutex for h/w register access vs.
> >>driver-private data structures, but there isn't).
> >
> >If you really need this guarantee (and I seriously hope not) then the only
> >option is a synchronous firmware load at driver init _before_ we launch
> >any of the asynchronous setup code. And there is already a lot of that,
> >and we're adding more all the time.
> >
> >What I expect we need is synchronization of just the revelant part with
> >the firmware loading, which necessarily needs to be somewhat async to be
> >able to support cros/android requirements. And yes that needs to be done
> >in a controlled manner, but most likely we need very specific solutions
> >for the problem at hand. Unconditionally holding dev->struct_mutex isn't
> >that solution.
> >
> >The other problem with dev->struct_mutex is that it's a giantic lock with
> >ill defined coverage and semantics. It's imo the biggest piece of
> >technical debt we carry around in i915.ko, and we pay the price for that
> >dearly&daily. Which means that since a few years any kind of code
> >which extended dev->struct_mutex to anything not clearly core gem data
> >structures was rejected.
> 
> Oh, I quite agree that the struct_mutex is an abomination and would
> certainly like to eliminate it. But at the moment it's the only sufficiently
> large-scale synchronisation operation available to ensure that (for example)
> we don't try to load the f/w at the same time that another thread is trying
> to reset the h/w.

I guess this is the crux here - for me part of the big problems around
dev->struct_mutex is that it doesn't just protect data structures to
ensure they're consistent, it also intermingles a lot of lifetime rules
(e.g. holding struct_mutex synchronizes against any final gem bo unref
too). And my experience from fixing up a few horribly misguided locking
designs in drm subsystem is that in general using locks to do
synchronization leads to serious maintainance headaches a few years down
the road. So if you state that only struct_mutex is a big enough lock to
synchronize async guc init then that kills your design for me already. Yes
there's a grey area like always in design topics, but if givin how massive
(and massively undocumented) struct_mutex is I'm against going into that
grey area with extreme prejudice.

I guess I need to write a whitepaper or something about this (it's
commonly accepted wisdom in the drm subsystem), but the summary is
relative simple: Don't use locks to ensure ordering of concurrent
operations or solve lifetime problems. For ordering use waitqueues,
completions or the work/timer specific things to order stuff. For lifetime
problems use refcounting.

Ofc because struct_mutex is such a sprawling beast that's a bit tricky,
but the simple ordering problem should be solved with a flush_work of the
async guc loader or similar. You still need to grab the struct_mutex for
the async loader (because that touches shared gem state, i.e. it's about
data consistency), but not because of ordering issues.

> None of this loader code really needs the struct_mutex specifically; the
> WARN_ON macros were just there to help callers know what degree of
> synchronisation they need to organise before calling these functions.
> 
> [snip]
> 
> >>BTW, the existing DMC loader probably won't work on Android :(
> >
> >Yeah I completely missed out on this fun since I presumed that firmware
> >loading is easy and simple. And if you look around on other drm drivers it
> >indeed is, they all use a synchronous request_firmware and if the firmware
> >isn't there, they just fall over (fully in the case of radeon.ko,
> >partially in the case of nouveau.ko since they have all the support in
> >place for handling a kms-only accel-less gpu in userspace anyway, like we
> >do). But for a bunch of reasons (afaik it's "you can't include a blob in a
> >gpl-ed kernel image" we need async firmware loading for cros&android).
> >
> >That leaves us with a situation where we should have done a special design
> >discussion about asynchronous firmware, but somehow failed do to that.
> >Which leaves us in a very ugly position.
> >
> >I talked with a bunch of people over the past few days to figure out how
> >this is supposed to work and also figure out why it's being done like that
> >today. I think I have a reasonable good plan for moving forward too. I'll
> >start a new top-level thread here to discuss this.
> >
> >Thanks, Daniel
> 
> It really isn't "asynchronous", it's just "deferred" -- but implying that
> everything that relies on having the firmware available also has to be
> deferred. For the DMC, that means we can't have full PM; and for the GuC, we
> can't submit any batches at all to any engine until the f/w load is done.

Yeah I'm ok with just calling it "deferred", as long as we agree on the
meaning of "not run synchronously from driver load". And since there's no
reason to wait for userspace to do anything we might as well run anything
we defer in an async worker (and insert any necessary waits there), which
is why I just call all deferred driver load work "async".

> Really, it would be simpler if we didn't support automatic firmware loading
> in the kernel at all, and had a userland startup process whose job was to
> locate and transfer the required firmware before the device could be used.
> But that would just give us the opposite problem, because the display device
> is one of the things that /should/ be usable even before the rootfs is
> mounted.

request_firmware supports this userland process you're describing above,
if you enable CONFIG_FW_LOADER_USER_HELPER_FALLBACK. Which is what I
suggested Android should use (cros might run into a nack from google
upstream). Unfortunately (special snowflake driver that no one wanted to
convert) request_firmware_nowait does _not_ support this. That's why we
need to use request_firmware + our own worker (otherwise request_firmware
would just block everyone).

Wrt using kms apis before we have all the firmware loaded: That's
definitely I use case we should try to support, unfortunately the current
proposal of blocking for guc loading in gem_open will break that - there's
only one device node for both kms and rendering. Some of the other
proposals for async^Wdeferred gem init (both stalling in add_request or
reusing the reset-in-progress logic) would solve that, but have their own
sets of downsides. These kind of problems are exactly why I want a proper
design discussion about async^Wdeferred gem init: We need a clear set of
requirements, then weigh the different options against each another and
decide upon one of the 3-4 proposed thus far.

> Anyway, I've posted a simplified v3 which only supports synchronous fetch,
> without the prefetch capability. See separate thread ...

Will take a look, thanks.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx