Re: [PATCH 3/6] drm/i915/huc: Add HuC fw loading support

Dave Gordon <david.s.gordon@xxxxxxxxx> · Fri, 15 Jul 2016 08:33:06 +0100

On 14/07/16 15:26, Daniel Vetter wrote:
On Thu, Jul 14, 2016 at 03:08:41PM +0100, Dave Gordon wrote:
On 13/07/16 13:48, Daniel Vetter wrote:
On Thu, Jun 23, 2016 at 02:52:41PM +0100, Peter Antoine wrote:
On Thu, 23 Jun 2016, Dave Gordon wrote:
On 22/06/16 09:31, Daniel Vetter wrote:
No, the *correct* fix is to unify all the firmware loaders we have.
There should just be ONE piece of code that can be used to fetch and
load ANy firmware into ANY auxiliary microcontroller. NOT one per
microcontroller, all different -- that way lies madness.

We already had a unified loader for the HuC and GuC a year ago, but IIRC
the party line then was "just make it (GuC) specific, then copypaste it
for the second uC, and when we've got three versions we'll have learnt
how we really want a unified loader to behave."

Well. here's the copypaste, and we already have a different loader for
the DMC/CSR, so it must be time for (re-)unification.

.Dave.

Just to add, if you uc_fw_fetch() has an error code you will still have to
remember the state of the fetch or at each reset/resume/etc... or you will
have to try the firmware load again and that can take a long time. So the
state will have to be re-instated.

Seeing this code was written with the given goals and were written in the
same vane as code that was deemed acceptable, it seems weird at this late
stage to change the design goals.

Note: this is the third time that these patches have been posted and were
only rejected (as far as I know) due to no open-source user. Which there is
now, and is why I have reposted these patches.

I never liked the guc firmware code, but figure for one copy it's not
worth fighting over. Adding more copies (or perpetuating the design by
making it generic) isn't what I'm looking for.

*You* asked for more copies, back when we proposed a single unified solution
last year. We already had a *single* GuC+HuC loader which could also have
been extended to support the DMC as well, but at the time you wanted a
GuC-specific version -- and by implication, a separate HuC loader -- *in
addition to* the DMC loader.

Firmware loading shouldn't be that complicated, really.

Maybe it shouldn't be, and maybe it isn't -- you may not be seeing how
simple this code actually is. Fetch firmware, validate it, save it in a GEM
object; later, DMA it to the h/w; at each stage keep track of status so we
know what has been done and what is still to do (or redo, during reset).

Any complications are because the h/w (e.g. write-once memory) makes them
necessary, or artefacts of the GEM object system, or because of the driver's
byzantine sequence of operations during load/reset/suspend/resume/unload.

The unified firmware loader is called request_firmware. If that's not good
enough, pls fix the core function, not paper code over in i915.

That's exactly the function we call. Then we have to validate and save the
blob. And remember that we've done so.

In that regard DMC/CSR is unified, everything else isn't yet.

Unified with what? Maybe the "DMC" is unified with the "CSR" -- which AFAIK
are the same thing -- and the software just randomly uses both names to
maximise confusion?

	if (HAS_CSR(dev)) {
		struct intel_csr *csr = &dev_priv->csr;

		err_printf(m, "DMC loaded: %s\n",
			   yesno(csr->dmc_payload != NULL));
		err_printf(m, "DMC fw version: %d.%d\n",
			   CSR_VERSION_MAJOR(csr->version),
			   CSR_VERSION_MINOR(csr->version));
	}
...
	if (!IS_GEN9(dev_priv)) {
		DRM_ERROR("No CSR support available for this platform\n");
		return;
	}
	if (!dev_priv->csr.dmc_payload) {
		DRM_ERROR("Tried to program CSR with empty payload\n");
		return;
	}

And according to the comments in intel_csr.c -- but not the code --

/*
  * Firmware loading status will be one of the below states:
  * FW_UNINITIALIZED, FW_LOADED, FW_FAILED.
  *
  * Once the firmware is written into the registers status will
  * be moved from FW_UNINITIALIZED to FW_LOADED and for any
  * erroneous condition status will be moved to FW_FAILED.
  */

So I don't think you should hold this code up as a masterpiece of "unified"
design -- which in any case you argued against last year, when we presented
a unified loader. Specifically, you said, "In my experience trying to
extract common code at all costs is harmful way too often."

Also, the approach taken in the DMC loader -- which appears to have been
copypasted from a /very early/ version of the GuC loader, before I fixed the
async-load problems -- just wouldn't work for the HuC/GuC, where the kernel
needs to know when the firmware load has been completed so that it can start
sending work to the GuC. The DMC loader only works because it doesn't
actually matter when (or if) it's loaded. It would be *completely wrong* to
load the HuC/Guc that way.

Iirc the big issue is delayed firmware loading for built-in i915 and fw
only available later on. This is an open issue in request_firmware() since
years, and there's various patches floating around. If the problem is that
Greg KH doesn't consider those patches, I can help with that. But not
pushing the core fix forward isn't acceptable imo.

We're not addressing that issue at all here, for Linux we expect the
firmware will be in the ramdisk so it's available immediately. Android has
an issue with that, but we already have solutions there.

Cros has the same issue, and it just cropped up again because Google is
unhappy that this is still not fixed in upstream. This is by no means
android specific at all, no reason to have a hack for it only in android.

Then you can take, or adapt, the patch we use in Android. It's very 
simple -- we defer as much engine initialisation as possible until first 
open. But that's nothing to do with this patchset.

Once that fix is landed
we can treat request_firmware as reliable (it might take a while, hence
must be run in an async work like DMC loading), with no need to ever retry
anything.

No, it *can't* run "in an async work like DMC loading" -- that was exactly
what was wrong with the original GuC loader before I got involved. The
firmware was delivered to the GuC h/w asynchronously, *after* the kernel had
already started sending work to the engines. That was utterly bogus!

*This* version is fully synchronous; the kernel calls for the firmware using
request_firmware() (and waits until it's succeeded or failed), and later
asks for the firmware to be loaded into the GuC (again, waiting until it has
succeeded or failed).

You need to appropriately sync with async work at the right point. For dmc
that's done using rpm/power domain refcounts, it's not nonexisting (since
indeed just running stuff async would fall over).

Actually the DMC load synchronisation *is* nonexistent. Mainline code 
*cannot tell* whether the DMC firmware has been loaded (unless there's 
some way to ask the DMC itself). It *cannot* (for example) stall until 
the load completes, because all it can tell is that there's a nonzero 
refcount on a power domain, not who holds it or why. That's *OK* for the 
DMC because the driver doesn't /need/ to know; it doesn't talk to the 
DMC firmware anyway. But for the GuC/HuC, we need to know.

For GuC/HuC we need some way to synchronize elsewhere, like flush_work
in execbuf.

And if the firmware failed to load after flush_work(), then we can
reliable fail execbuf with EIO (like in any other case when the hw is
considered too dead too be useful). And if flush_work is too expensive,
we can create a nice completion, where the fastpath is lockless. I still
don't see why a big state machinery is needed for this at all.

The completion approach is *exactly* what we used in the original 
unified loader design with asynchronous fetch-ahead and correctly 
synchronised loading, and *you* objected to it, even though it was 
already coded for the GuC+HuC and could easily have been used for the 
DMC as well.

And if this version is fully synchrnous, why does it even need such a
complicated status handling? Fall over with -EIO really should do be
enough.

It's NOT complicated, and it IS necessary.

A state machine with only FOUR states and FIVE transitions 
(NONE->PENDING->[SUCCESS|FAIL]->NONE) is hardly complicated!

And its NECESSARY because the each of the fetch-from-filesystem and 
push-into-WOPCM operations are separate phases, and we want to do the 
first just once -- especially as, in this synchronous scheme, it may 
delay driver initialisation if we have to wait for the user helper. And 
(unlike the DMC) the GuC and HuC have to be DMA-REloaded at various 
points (RC6, resume, reset, etC). So OF COURSE we have to pass status 
from the fetch to the DMAloader, and we have to remember whether each 
stage has not yet started, is in progress, has been successful, or has 
already failed. You could move the GuC state out into the general i915 
data and have the /callers/ of the GuC-related functions hold and 
interpret it, but that would make the code far less modular.

If fw loading fails we can just mark the entire render part of
the gpu as dead by injecting the equivalent of a non-recoverable hang
(async setup) or failing engine init with -EIO (if this is still
synchronous, which I don't expect really).

Which is just what we do. This patchset is essentially just adding HuC
loading to the existing GuC loading process, reusing as much as possible of
the same code.

If there's another reason for this complexity, please explain since I'd
like to understand why we need this.
-Daniel

Less complexity than you think.

   531  1698 14255 drivers/gpu/drm/i915/intel_csr.c
   751  2757 22889 drivers/gpu/drm/i915/intel_guc_loader.c

Much the same size, to within a (binary) order-of-magnitude. Obviously the
GuC code *necessarily* does more because the f/w interfaces are much more
complex (ctx pool, ADS, etc); but those are not optional. And the GuC code
has to deal with reloading after RC6 or GPU reset, which AFAICT the DMC
doesn't.

I'm not against code size, but against the status matrix. I nuked it from
intel_csr.c, I want it gone from the guc/huc loader too. It imo serves no
point.
-Daniel

There's no "matrix", just the simplest possible linear sequence.
There are FIVE, count'em, FIVE assignments to guc_fw_fetch_status in the 
code, representing exactly those transitions mentioned above
(NONE->PENDING->[SUCCESS|FAIL]->NONE).

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx