Re: [PATCH 2/3] drm/i915/guc: downgrade some DRM_ERROR() messages to DRM_WARN()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/07/16 10:20, Tvrtko Ursulin wrote:

On 11/07/16 19:01, Dave Gordon wrote:
Where we're going to continue regardless of the problem, rather than
fail, then the message should be a WARNing rather than an ERROR.

Signed-off-by: Dave Gordon <david.s.gordon@xxxxxxxxx>
---
  drivers/gpu/drm/i915/i915_guc_submission.c | 18 +++++++-----------
  1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c
b/drivers/gpu/drm/i915/i915_guc_submission.c
index 2112e02..e299b64 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -114,10 +114,8 @@ static int host2guc_action(struct intel_guc *guc,
u32 *data, u32 len)
          if (ret != -ETIMEDOUT)
              ret = -EIO;

-        DRM_ERROR("GUC: host2guc action 0x%X failed. ret=%d "
-                "status=0x%08X response=0x%08X\n",
-                data[0], ret, status,
-                I915_READ(SOFT_SCRATCH(15)));
+        DRM_WARN("Action 0x%X failed; ret=%d status=0x%08X
response=0x%08X\n",
+             data[0], ret, status, I915_READ(SOFT_SCRATCH(15)));

Hm, this does propagate the error code to the callers some which will
act and log the failure. Majority won't though - like suspend/resume
etc. In those cases it feels more like an error than a warning.

It's definitely something that shouldn't happen, so we need to log it; and it has to be done at this level because we don't pass enough information back to leave it to the caller. But OTOH this layer doesn't have enough information to determine just how serious a failure is.

So as a compromise the idea is to log a WARNING here, and then the caller can choose to:
1. pass the failure up (until we reach a layer with more context)
2. quietly disregard the failure and continue anyway
3. report an ERROR and fail/abort the process.

That way we should get all the useful information about the root cause of something that ends up as an ERROR, while neither ignoring nor being too verbose about failures from which we may ultimately recover.

For example, one of the callers (the doorbell h/w initialisation code) considers some failures as interesting but not critical (DEBUG level) but other instances of the exact same operation are fatal (ERROR).

          dev_priv->guc.action_fail += 1;
          dev_priv->guc.action_err = ret;
@@ -553,8 +551,8 @@ static int guc_ring_doorbell(struct
i915_guc_client *gc)
          if (db_ret.db_status == GUC_DOORBELL_DISABLED)
              break;

-        DRM_ERROR("Cookie mismatch. Expected %d, returned %d\n",
-              db_cmp.cookie, db_ret.cookie);
+        DRM_WARN("Cookie mismatch. Expected %d, found %d\n",
+             db_cmp.cookie, db_ret.cookie);

This one is interesting, error is propagated out a bit but then ignored
in actual command submission.

If the above message means command will not be submitted error is
probably more appropriate. Or perhaps we cannot tell if the command was
submitted or not in this case?

Note that this is inside a retry loop. It shouldn't ever happen, but if it does we'll report it and try one more time. If it keeps happening (which would require active interference by some other party) we won't be able to ring the doorbell and the failure will be propagated back out of the GuC submission code. OTOH the caller then ignores it because "submission is not allowed to fail" (!)

And yes, it is then undefined as to whether the command has been submitted or not. If it hasn't we'll expect a GPU hang later.

.Dave.

          /* update the cookie to newly read cookie from GuC */
          db_cmp.cookie = db_ret.cookie;
@@ -726,8 +724,8 @@ static void guc_init_doorbell_hw(struct intel_guc
*guc)
      /* Restore to original value */
      err = guc_update_doorbell_id(guc, client, db_id);
      if (err)
-        DRM_ERROR("Failed to restore doorbell to %d, err %d\n",
-            db_id, err);
+        DRM_WARN("Failed to restore doorbell to %d, err %d\n",
+             db_id, err);

      for (i = 0; i < GUC_MAX_DOORBELLS; ++i) {
          i915_reg_t drbreg = GEN8_DRBREGL(i);
@@ -819,8 +817,6 @@ static void guc_init_doorbell_hw(struct intel_guc
*guc)
      return client;

  err:
-    DRM_ERROR("FAILED to create priority %u GuC client!\n", priority);
-
      guc_client_free(dev_priv, client);
      return NULL;
  }
@@ -998,7 +994,7 @@ int i915_guc_submission_enable(struct
drm_i915_private *dev_priv)
                    GUC_CTX_PRIORITY_KMD_NORMAL,
                    dev_priv->kernel_context);
      if (!client) {
-        DRM_ERROR("Failed to create execbuf guc_client\n");
+        DRM_ERROR("Failed to create normal GuC client!\n");
          return -ENOMEM;
      }



Regards,
Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux