On Thu, Nov 21, 2019 at 04:13:25PM -0800, Matthew Brost wrote:On Thu, Nov 21, 2019 at 12:43:26PM +0100, Michal Wajdeczko wrote:On Thu, 21 Nov 2019 00:56:02 +0100, <John.C.Harrison@xxxxxxxxx> wrote:From: Matthew Brost <matthew.brost@xxxxxxxxx> Add non blocking CTB send fuction, intel_guc_send_nb. In order to support a non blocking CTB send fuction a spin lock is needed to2x typosprotect the CTB descriptors fields. Also the non blocking call must notupdate the fence value as this value is owned by the blocking call (intel_guc_send).you probably mean "intel_guc_send_ct", as intel_guc_send is just a wrapperaround guc->sendAh, yes.The blocking CTB now must have a flow control mechanism to ensure the buffer isn't overrun. A lazy spin wait is used as we believe the flow control condition should be rare with properly sized buffer. A retry counter is also implemented which fails H2G CTBs once a limit is reached to prevent deadlock. The function, intel_guc_send_nb, is exported in this patch but unused. Several patches later in the series make use of this function.It's likely in yet another seriesYes, it is.Cc: John Harrison <john.c.harrison@xxxxxxxxx> Signed-off-by: Matthew Brost <matthew.brost@xxxxxxxxx> --- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 2 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 97 +++++++++++++++++++---- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 10 ++- 3 files changed, 91 insertions(+), 18 deletions(-)diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.hindex e6400204a2bd..77c5af919ace 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h@@ -94,6 +94,8 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,return guc->send(guc, action, len, response_buf, response_buf_size); }+int intel_guc_send_nb(struct intel_guc_ct *ct, const u32 *action, u32 len);+Hmm, this mismatch of guc/ct parameter breaks the our layering. But we can keep this layering intact by introducing some flags to the existing guc_send() function. These flags could be passed as high bits in action[0], like this:This seems reasonable.Prototyped this and I don't like it all. First 'action' is a const so what youare suggesting doesn't work unless that is changed.
I guess only explicit clearing of that flag from const action was a mistake,
rest should be ok.
Also what if all bits in DW eventually mean something,
In all guc_send functions we only use lower 16 bits for action code. This action code is then passed in MMIO messages in bits 15:0 and in CTB messages in bits 31:16.
to me overloading a field isn't a good idea if anything we should add another argument to guc->send().
guc->send already takes 5 params: int (*send)(struct intel_guc *guc, const u32 *data, u32 len, u32 *response_buf, u32 response_buf_size); We can add explicit flags, but I see nothing wrong in using free top bits from data[0] to pass these flags there.
But I'd honestly preferwe just leave it as is. Non-blocking only applies to CTs (not MMIO) and we haveGEM_BUG_ON to protect us if this function is called incorrectly.
Well, I guess many would still prefer to be hit by WARN from guc-send-nop rather than BUG_ON from guc-send-nb
Doing what you suggest just makes everything more complicated IMO.
I'm not sold that adding GUC_ACTION_FLAG_DONT_WAIT to action[0] (as shown below) is that complicated Michal
Matt#define GUC_ACTION_FLAG_DONT_WAIT 0x80000000 int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset) { u32 action[] = { INTEL_GUC_ACTION_AUTHENTICATE_HUC | GUC_ACTION_FLAG_DONT_WAIT, rsa_offset }; return intel_guc_send(guc, action, ARRAY_SIZE(action)); }then actual back-end of guc->send can take proper steps based on this flag:@@ int intel_guc_send_mmio(struct intel_guc *guc, const u32 *action, u32 len,GEM_BUG_ON(!len); GEM_BUG_ON(len > guc->send_regs.count); + if (*action & GUC_ACTION_FLAG_DONT_WAIT) + return -EINVAL; + *action &= ~GUC_ACTION_FLAG_DONT_WAIT; + /* We expect only action code */ GEM_BUG_ON(*action & ~INTEL_GUC_MSG_CODE_MASK);@@ @@ int intel_guc_send_ct(struct intel_guc *guc, const u32 *action, u32 len,u32 status = ~0; /* undefined */ int ret; + if (*action & GUC_ACTION_FLAG_DONT_WAIT) { + GEM_BUG_ON(response_buf); + GEM_BUG_ON(response_buf_size); + return ctch_send_nb(ct, ctch, action, len); + } + mutex_lock(&guc->send_mutex);ret = ctch_send(ct, ctch, action, len, response_buf, response_buf_size,static inline void intel_guc_notify(struct intel_guc *guc) { guc->notify(guc);diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.cindex b49115517510..e50d968b15d5 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -3,6 +3,8 @@ * Copyright © 2016-2019 Intel Corporation */ +#include <linux/circ_buf.h> + #include "i915_drv.h" #include "intel_guc_ct.h" @@ -12,6 +14,8 @@ #define CT_DEBUG_DRIVER(...) do { } while (0) #endif +#define MAX_RETRY 0x1000000 + struct ct_request { struct list_head link; u32 fence;@@ -40,7 +44,8 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)/* we're using static channel owners */ ct->host_channel.owner = CTB_OWNER_HOST; - spin_lock_init(&ct->lock); + spin_lock_init(&ct->request_lock); + spin_lock_init(&ct->send_lock); INIT_LIST_HEAD(&ct->pending_requests); INIT_LIST_HEAD(&ct->incoming_requests); INIT_WORK(&ct->worker, ct_incoming_request_worker_func);@@ -291,7 +296,8 @@ static u32 ctch_get_next_fence(struct intel_guc_ct_channel *ctch)static int ctb_write(struct intel_guc_ct_buffer *ctb, const u32 *action, u32 len /* in dwords */, - u32 fence, + u32 fence_value, + bool enable_fence,maybe we can just guarantee that fence=0 will never be used as a valid fence id, then this flag could be replaced with (fence != 0) check.Yes, again seems reasonable. Initialize next_fence = 1, then increment by 2 eachtime and this works.bool want_response) { struct guc_ct_buffer_desc *desc = ctb->desc;@@ -328,18 +334,18 @@ static int ctb_write(struct intel_guc_ct_buffer *ctb,* DW2+: action data */ header = (len << GUC_CT_MSG_LEN_SHIFT) | - (GUC_CT_MSG_WRITE_FENCE_TO_DESC) | + (enable_fence ? GUC_CT_MSG_WRITE_FENCE_TO_DESC : 0) |Hmm, even if we ask fw to do not write back fence to the descriptor, IIRC current firmware will unconditionally write back return status of this non-blocking call, possibly overwriting status of the blocked call.Yes, known problem with the interface that needs to be fixed.(want_response ? GUC_CT_MSG_SEND_STATUS : 0) |btw, if we switch all requests to expect reply send back over CTB, then we can possibly drop the send_mutex in CTB paths, and block only when there is no DONT_WAIT flag and we have to wait for response.Rather just wait for the GuC to fix this.(action[0] << GUC_CT_MSG_ACTION_SHIFT); CT_DEBUG_DRIVER("CT: writing %*ph %*ph %*ph\n", - 4, &header, 4, &fence, + 4, &header, 4, &fence_value, 4 * (len - 1), &action[1]); cmds[tail] = header; tail = (tail + 1) % size; - cmds[tail] = fence; + cmds[tail] = fence_value; tail = (tail + 1) % size; for (i = 1; i < len; i++) {@@ -440,6 +446,47 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)return err; }+static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, u32 len)+{ + u32 head = READ_ONCE(desc->head); + u32 space; + + space = CIRC_SPACE(desc->tail, head, desc->size); + + return space >= len; +} + +int intel_guc_send_nb(struct intel_guc_ct *ct, + const u32 *action, + u32 len) +{ + struct intel_guc_ct_channel *ctch = &ct->host_channel; + struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND]; + struct guc_ct_buffer_desc *desc = ctb->desc; + int err; + + GEM_BUG_ON(!ctch->enabled); + GEM_BUG_ON(!len); + GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK); + lockdep_assert_held(&ct->send_lock);hmm, does it mean that now it's caller responsibility to spinlock on CT private lock ? That is not how other guc_send() functions work.Yes, that how I would like this work as I feel like it gives more flexability to the caller on the -EBUSY case. The caller can call intel_guc_send_nb again while still holding the lock or it release lock the and use a different form of flow control. Perhaps locking / unlocking should be exposed via static inlines ratherthan the caller directly manipulating the lock?+ + if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) { + ct->retry++; + if (ct->retry >= MAX_RETRY) + return -EDEADLK; + else + return -EBUSY; + } + + ct->retry = 0; + err = ctb_write(ctb, action, len, 0, false, false); + if (unlikely(err)) + return err; + + intel_guc_notify(ct_to_guc(ct)); + return 0; +} + static int ctch_send(struct intel_guc_ct *ct, struct intel_guc_ct_channel *ctch, const u32 *action, @@ -460,17 +507,35 @@ static int ctch_send(struct intel_guc_ct *ct, GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK); GEM_BUG_ON(!response_buf && response_buf_size); + /* + * We use a lazy spin wait loop here as we believe that if the CT + * buffers are sized correctly the flow control condition should be + * rare. + */ +retry: + spin_lock_irqsave(&ct->send_lock, flags); + if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) { + spin_unlock_irqrestore(&ct->send_lock, flags); + ct->retry++; + if (ct->retry >= MAX_RETRY) + return -EDEADLK;I'm not sure what's better: have secret deadlock hard to reproduce, or deadlocks easier to catch that helps improve to be deadlock-cleanThis is covering the case where the has died and to avoid deadlock. Eventually we will have some GuC health check code that will trigger a full GPU reset ifthe GuC has died. We need a way for code spinning on the CTBs to exit.I've already tweaked this code locally a bit to use an atomic too with the idea being the GuC health check code can set this value to have all code spinning onCTBs immediately return --EDEADLK when the GuC has died.+ cpu_relax(); + goto retry; + } + + ct->retry = 0; fence = ctch_get_next_fence(ctch); request.fence = fence; request.status = 0; request.response_len = response_buf_size; request.response_buf = response_buf; - spin_lock_irqsave(&ct->lock, flags); + spin_lock(&ct->request_lock); list_add_tail(&request.link, &ct->pending_requests); - spin_unlock_irqrestore(&ct->lock, flags); + spin_unlock(&ct->request_lock); - err = ctb_write(ctb, action, len, fence, !!response_buf); + err = ctb_write(ctb, action, len, fence, true, !!response_buf); + spin_unlock_irqrestore(&ct->send_lock, flags); if (unlikely(err)) goto unlink; @@ -501,9 +566,9 @@ static int ctch_send(struct intel_guc_ct *ct, } unlink: - spin_lock_irqsave(&ct->lock, flags); + spin_lock_irqsave(&ct->request_lock, flags); list_del(&request.link); - spin_unlock_irqrestore(&ct->lock, flags); + spin_unlock_irqrestore(&ct->request_lock, flags); return err; }@@ -653,7 +718,7 @@ static int ct_handle_response(struct intel_guc_ct *ct, const u32 *msg)CT_DEBUG_DRIVER("CT: response fence %u status %#x\n", fence, status); - spin_lock(&ct->lock); + spin_lock(&ct->request_lock); list_for_each_entry(req, &ct->pending_requests, link) { if (unlikely(fence != req->fence)) { CT_DEBUG_DRIVER("CT: request %u awaits response\n",@@ -672,7 +737,7 @@ static int ct_handle_response(struct intel_guc_ct *ct, const u32 *msg)found = true; break; } - spin_unlock(&ct->lock); + spin_unlock(&ct->request_lock); if (!found) DRM_ERROR("CT: unsolicited response %*ph\n", 4 * msglen, msg);@@ -710,13 +775,13 @@ static bool ct_process_incoming_requests(struct intel_guc_ct *ct)u32 *payload; bool done; - spin_lock_irqsave(&ct->lock, flags); + spin_lock_irqsave(&ct->request_lock, flags); request = list_first_entry_or_null(&ct->incoming_requests, struct ct_incoming_request, link); if (request) list_del(&request->link); done = !!list_empty(&ct->incoming_requests); - spin_unlock_irqrestore(&ct->lock, flags); + spin_unlock_irqrestore(&ct->request_lock, flags); if (!request) return true;@@ -777,9 +842,9 @@ static int ct_handle_request(struct intel_guc_ct *ct, const u32 *msg)} memcpy(request->msg, msg, 4 * msglen); - spin_lock_irqsave(&ct->lock, flags); + spin_lock_irqsave(&ct->request_lock, flags); list_add_tail(&request->link, &ct->incoming_requests); - spin_unlock_irqrestore(&ct->lock, flags); + spin_unlock_irqrestore(&ct->request_lock, flags); queue_work(system_unbound_wq, &ct->worker); return 0;diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.hindex 7c24d83f5c24..bc670a796bd8 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -62,8 +62,11 @@ struct intel_guc_ct { struct intel_guc_ct_channel host_channel; /* other channels are tbd */ - /** @lock: protects pending requests list */ - spinlock_t lock; + /** @request_lock: protects pending requests list */ + spinlock_t request_lock; + + /** @send_lock: protects h2g channel */ + spinlock_t send_lock; /** @pending_requests: list of requests waiting for response */ struct list_head pending_requests; @@ -73,6 +76,9 @@ struct intel_guc_ct { /** @worker: worker for handling incoming requests */ struct work_struct worker; + + /** @retry: the number of times a H2G CTB has been retried */ + u32 retry; }; void intel_guc_ct_init_early(struct intel_guc_ct *ct);_______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx_______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx