On Thu, Nov 21, 2019 at 04:13:25PM -0800, Matthew Brost wrote:
On Thu, Nov 21, 2019 at 12:43:26PM +0100, Michal Wajdeczko wrote:On Thu, 21 Nov 2019 00:56:02 +0100, <John.C.Harrison@xxxxxxxxx> wrote:From: Matthew Brost <matthew.brost@xxxxxxxxx> Add non blocking CTB send fuction, intel_guc_send_nb. In order to support a non blocking CTB send fuction a spin lock is needed to2x typosprotect the CTB descriptors fields. Also the non blocking call must not update the fence value as this value is owned by the blocking call (intel_guc_send).you probably mean "intel_guc_send_ct", as intel_guc_send is just a wrapper around guc->sendAh, yes.The blocking CTB now must have a flow control mechanism to ensure the buffer isn't overrun. A lazy spin wait is used as we believe the flow control condition should be rare with properly sized buffer. A retry counter is also implemented which fails H2G CTBs once a limit is reached to prevent deadlock. The function, intel_guc_send_nb, is exported in this patch but unused. Several patches later in the series make use of this function.It's likely in yet another seriesYes, it is.Cc: John Harrison <john.c.harrison@xxxxxxxxx> Signed-off-by: Matthew Brost <matthew.brost@xxxxxxxxx> --- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 2 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 97 +++++++++++++++++++---- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 10 ++- 3 files changed, 91 insertions(+), 18 deletions(-)diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.hindex e6400204a2bd..77c5af919ace 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h@@ -94,6 +94,8 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,return guc->send(guc, action, len, response_buf, response_buf_size); }+int intel_guc_send_nb(struct intel_guc_ct *ct, const u32 *action, u32 len);+Hmm, this mismatch of guc/ct parameter breaks the our layering. But we can keep this layering intact by introducing some flags to the existing guc_send() function. These flags could be passed as high bits in action[0], like this:This seems reasonable.
Prototyped this and I don't like it all. First 'action' is a const so what you are suggesting doesn't work unless that is changed. Also what if all bits in DW eventually mean something, to me overloading a field isn't a good idea if anything we should add another argument to guc->send(). But I'd honestly prefer we just leave it as is. Non-blocking only applies to CTs (not MMIO) and we have GEM_BUG_ON to protect us if this function is called incorrectly. Doing what you suggest just makes everything more complicated IMO. Matt
#define GUC_ACTION_FLAG_DONT_WAIT 0x80000000 int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset) { u32 action[] = { INTEL_GUC_ACTION_AUTHENTICATE_HUC | GUC_ACTION_FLAG_DONT_WAIT, rsa_offset }; return intel_guc_send(guc, action, ARRAY_SIZE(action)); } then actual back-end of guc->send can take proper steps based on this flag:@@ int intel_guc_send_mmio(struct intel_guc *guc, const u32 *action, u32 len,GEM_BUG_ON(!len); GEM_BUG_ON(len > guc->send_regs.count); + if (*action & GUC_ACTION_FLAG_DONT_WAIT) + return -EINVAL; + *action &= ~GUC_ACTION_FLAG_DONT_WAIT; + /* We expect only action code */ GEM_BUG_ON(*action & ~INTEL_GUC_MSG_CODE_MASK);@@ @@ int intel_guc_send_ct(struct intel_guc *guc, const u32 *action, u32 len,u32 status = ~0; /* undefined */ int ret; + if (*action & GUC_ACTION_FLAG_DONT_WAIT) { + GEM_BUG_ON(response_buf); + GEM_BUG_ON(response_buf_size); + return ctch_send_nb(ct, ctch, action, len); + } + mutex_lock(&guc->send_mutex);ret = ctch_send(ct, ctch, action, len, response_buf, response_buf_size,static inline void intel_guc_notify(struct intel_guc *guc) { guc->notify(guc);diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.cindex b49115517510..e50d968b15d5 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -3,6 +3,8 @@ * Copyright © 2016-2019 Intel Corporation */ +#include <linux/circ_buf.h> + #include "i915_drv.h" #include "intel_guc_ct.h" @@ -12,6 +14,8 @@ #define CT_DEBUG_DRIVER(...) do { } while (0) #endif +#define MAX_RETRY 0x1000000 + struct ct_request { struct list_head link; u32 fence; @@ -40,7 +44,8 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct) /* we're using static channel owners */ ct->host_channel.owner = CTB_OWNER_HOST; - spin_lock_init(&ct->lock); + spin_lock_init(&ct->request_lock); + spin_lock_init(&ct->send_lock); INIT_LIST_HEAD(&ct->pending_requests); INIT_LIST_HEAD(&ct->incoming_requests); INIT_WORK(&ct->worker, ct_incoming_request_worker_func);@@ -291,7 +296,8 @@ static u32 ctch_get_next_fence(struct intel_guc_ct_channel *ctch)static int ctb_write(struct intel_guc_ct_buffer *ctb, const u32 *action, u32 len /* in dwords */, - u32 fence, + u32 fence_value, + bool enable_fence,maybe we can just guarantee that fence=0 will never be used as a valid fence id, then this flag could be replaced with (fence != 0) check.Yes, again seems reasonable. Initialize next_fence = 1, then increment by 2 each time and this works.bool want_response) { struct guc_ct_buffer_desc *desc = ctb->desc;@@ -328,18 +334,18 @@ static int ctb_write(struct intel_guc_ct_buffer *ctb,* DW2+: action data */ header = (len << GUC_CT_MSG_LEN_SHIFT) | - (GUC_CT_MSG_WRITE_FENCE_TO_DESC) | + (enable_fence ? GUC_CT_MSG_WRITE_FENCE_TO_DESC : 0) |Hmm, even if we ask fw to do not write back fence to the descriptor, IIRC current firmware will unconditionally write back return status of this non-blocking call, possibly overwriting status of the blocked call.Yes, known problem with the interface that needs to be fixed.(want_response ? GUC_CT_MSG_SEND_STATUS : 0) |btw, if we switch all requests to expect reply send back over CTB, then we can possibly drop the send_mutex in CTB paths, and block only when there is no DONT_WAIT flag and we have to wait for response.Rather just wait for the GuC to fix this.(action[0] << GUC_CT_MSG_ACTION_SHIFT); CT_DEBUG_DRIVER("CT: writing %*ph %*ph %*ph\n", - 4, &header, 4, &fence, + 4, &header, 4, &fence_value, 4 * (len - 1), &action[1]); cmds[tail] = header; tail = (tail + 1) % size; - cmds[tail] = fence; + cmds[tail] = fence_value; tail = (tail + 1) % size; for (i = 1; i < len; i++) {@@ -440,6 +446,47 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)return err; }+static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, u32 len)+{ + u32 head = READ_ONCE(desc->head); + u32 space; + + space = CIRC_SPACE(desc->tail, head, desc->size); + + return space >= len; +} + +int intel_guc_send_nb(struct intel_guc_ct *ct, + const u32 *action, + u32 len) +{ + struct intel_guc_ct_channel *ctch = &ct->host_channel; + struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND]; + struct guc_ct_buffer_desc *desc = ctb->desc; + int err; + + GEM_BUG_ON(!ctch->enabled); + GEM_BUG_ON(!len); + GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK); + lockdep_assert_held(&ct->send_lock);hmm, does it mean that now it's caller responsibility to spinlock on CT private lock ? That is not how other guc_send() functions work.Yes, that how I would like this work as I feel like it gives more flexability to the caller on the -EBUSY case. The caller can call intel_guc_send_nb again while still holding the lock or it release lock the and use a different form of flow control. Perhaps locking / unlocking should be exposed via static inlines rather than the caller directly manipulating the lock?+ + if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) { + ct->retry++; + if (ct->retry >= MAX_RETRY) + return -EDEADLK; + else + return -EBUSY; + } + + ct->retry = 0; + err = ctb_write(ctb, action, len, 0, false, false); + if (unlikely(err)) + return err; + + intel_guc_notify(ct_to_guc(ct)); + return 0; +} + static int ctch_send(struct intel_guc_ct *ct, struct intel_guc_ct_channel *ctch, const u32 *action, @@ -460,17 +507,35 @@ static int ctch_send(struct intel_guc_ct *ct, GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK); GEM_BUG_ON(!response_buf && response_buf_size); + /* + * We use a lazy spin wait loop here as we believe that if the CT + * buffers are sized correctly the flow control condition should be + * rare. + */ +retry: + spin_lock_irqsave(&ct->send_lock, flags); + if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) { + spin_unlock_irqrestore(&ct->send_lock, flags); + ct->retry++; + if (ct->retry >= MAX_RETRY) + return -EDEADLK;I'm not sure what's better: have secret deadlock hard to reproduce, or deadlocks easier to catch that helps improve to be deadlock-cleanThis is covering the case where the has died and to avoid deadlock. Eventually we will have some GuC health check code that will trigger a full GPU reset if the GuC has died. We need a way for code spinning on the CTBs to exit. I've already tweaked this code locally a bit to use an atomic too with the idea being the GuC health check code can set this value to have all code spinning on CTBs immediately return --EDEADLK when the GuC has died.+ cpu_relax(); + goto retry; + } + + ct->retry = 0; fence = ctch_get_next_fence(ctch); request.fence = fence; request.status = 0; request.response_len = response_buf_size; request.response_buf = response_buf; - spin_lock_irqsave(&ct->lock, flags); + spin_lock(&ct->request_lock); list_add_tail(&request.link, &ct->pending_requests); - spin_unlock_irqrestore(&ct->lock, flags); + spin_unlock(&ct->request_lock); - err = ctb_write(ctb, action, len, fence, !!response_buf); + err = ctb_write(ctb, action, len, fence, true, !!response_buf); + spin_unlock_irqrestore(&ct->send_lock, flags); if (unlikely(err)) goto unlink; @@ -501,9 +566,9 @@ static int ctch_send(struct intel_guc_ct *ct, } unlink: - spin_lock_irqsave(&ct->lock, flags); + spin_lock_irqsave(&ct->request_lock, flags); list_del(&request.link); - spin_unlock_irqrestore(&ct->lock, flags); + spin_unlock_irqrestore(&ct->request_lock, flags); return err; }@@ -653,7 +718,7 @@ static int ct_handle_response(struct intel_guc_ct *ct, const u32 *msg)CT_DEBUG_DRIVER("CT: response fence %u status %#x\n", fence, status); - spin_lock(&ct->lock); + spin_lock(&ct->request_lock); list_for_each_entry(req, &ct->pending_requests, link) { if (unlikely(fence != req->fence)) { CT_DEBUG_DRIVER("CT: request %u awaits response\n",@@ -672,7 +737,7 @@ static int ct_handle_response(struct intel_guc_ct *ct, const u32 *msg)found = true; break; } - spin_unlock(&ct->lock); + spin_unlock(&ct->request_lock); if (!found) DRM_ERROR("CT: unsolicited response %*ph\n", 4 * msglen, msg);@@ -710,13 +775,13 @@ static bool ct_process_incoming_requests(struct intel_guc_ct *ct)u32 *payload; bool done; - spin_lock_irqsave(&ct->lock, flags); + spin_lock_irqsave(&ct->request_lock, flags); request = list_first_entry_or_null(&ct->incoming_requests, struct ct_incoming_request, link); if (request) list_del(&request->link); done = !!list_empty(&ct->incoming_requests); - spin_unlock_irqrestore(&ct->lock, flags); + spin_unlock_irqrestore(&ct->request_lock, flags); if (!request) return true;@@ -777,9 +842,9 @@ static int ct_handle_request(struct intel_guc_ct *ct, const u32 *msg)} memcpy(request->msg, msg, 4 * msglen); - spin_lock_irqsave(&ct->lock, flags); + spin_lock_irqsave(&ct->request_lock, flags); list_add_tail(&request->link, &ct->incoming_requests); - spin_unlock_irqrestore(&ct->lock, flags); + spin_unlock_irqrestore(&ct->request_lock, flags); queue_work(system_unbound_wq, &ct->worker); return 0;diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.hindex 7c24d83f5c24..bc670a796bd8 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -62,8 +62,11 @@ struct intel_guc_ct { struct intel_guc_ct_channel host_channel; /* other channels are tbd */ - /** @lock: protects pending requests list */ - spinlock_t lock; + /** @request_lock: protects pending requests list */ + spinlock_t request_lock; + + /** @send_lock: protects h2g channel */ + spinlock_t send_lock; /** @pending_requests: list of requests waiting for response */ struct list_head pending_requests; @@ -73,6 +76,9 @@ struct intel_guc_ct { /** @worker: worker for handling incoming requests */ struct work_struct worker; + + /** @retry: the number of times a H2G CTB has been retried */ + u32 retry; }; void intel_guc_ct_init_early(struct intel_guc_ct *ct);_______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx_______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx