On ke, 2017-02-15 at 18:28 -0800, Daniele Ceraolo Spurio wrote: > > On 14/02/17 05:53, Joonas Lahtinen wrote: > > > > Started adding proper teardown to guc_client_alloc, ended up removing > > quite a few dead ends where errors communicating with the GuC were > > silently ignored. There also seemed to be quite a few erronous > > teardown actions performed in case of an error (ordering wrong). > > > > v2: > > - Increase function symmetry/proximity (Michal/Daniele) > > - Fix __reserve_doorbell accounting for high priority (Daniele) > > - Call __update_doorbell_desc! (Daniele) > > - Isolate __guc_{,de}allocate_doorbell (Michal/Daniele) > > > > Signed-off-by: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > > Cc: Michal Wajdeczko <michal.wajdeczko@xxxxxxxxx> > > Cc: Arkadiusz Hiler <arkadiusz.hiler@xxxxxxxxx> > > Cc: Oscar Mateo <oscar.mateo@xxxxxxxxx> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx> <SNIP> > > +static bool has_doorbell(struct i915_guc_client *client) > > +{ > > + if (client->doorbell_id == GUC_DOORBELL_INVALID) > > + return false; > > > > - /* Activate the new doorbell */ > > - __set_bit(new_id, doorbell_bitmap); > > + return test_bit(client->doorbell_id, client->guc->doorbell_bitmap); > > Should we warn/gem_bug if client->doorbell_id != GUC_DOORBELL_INVALID > and the bit in guc->doorbell_bitmap is not set? Not sure if you plan to > decouple them more in the future, but with the current code it should > always be impossible to have a valid doorbell without the bit in the > bitmask being set. Maybe we can just return client->doorbell_id != > GUC_DOORBELL_INVALID here if we have no plan for a case where they can > be out of sync. Kinda a leftover from restructuring, the selection and reservation were a different thing at some point. Changed into GEM_BUG_ON. > > +static int __destroy_doorbell(struct i915_guc_client *client) > > { > > - (void)guc_update_doorbell_id(guc, client, GUC_INVALID_DOORBELL_ID); > > + struct guc_doorbell_info *doorbell; > > > > - /* XXX: wait for any interrupts */ > > - /* XXX: wait for workqueue to drain */ > > + doorbell = __get_doorbell(client); > > + doorbell->db_status = GUC_DOORBELL_DISABLED; > > + doorbell->cookie = 0; > > + > > Not from this patch (but since we're at it...), I did a bit of digging > and I've found that doorbell release flow requires SW to poll the > GEN8_DRBREGL(db_id) register after updating doorbell->db_status to wait > for the GEN8_DRB_VALID bit to go to zero. This ensures that any pending > events are processed before we call into GuC to release the doorbell. > Note that GuC will fail the DEALLOCATE_DOORBELL action if the bit has > not gone to zero yet. This is currently not an issue, probably because > we use a single doorbell and we don't ring it near release time, but the > situation could change in the future so I believe it is worth to fix it > now. I can follow up with an additional patch if you want to keep this > one as refactoring only. Ack, add a follow-up on top of your series. > > -static uint32_t select_doorbell_cacheline(struct intel_guc *guc) > > + return 0; > > +} > > + > > +static unsigned long __reserve_cacheline(struct intel_guc* guc) > > "reserve_cacheline" doesn't really reflect what happens, because more > doorbells can use the same cacheline (while they are on different > physical pages) and there is no concept of unreserving the cacheline. > The idea is to try to avoid having lots of doorbells on the same > cacheline if possible to make the monitoring more efficient on the HW, > so I'd keep the "select_cacheline" naming for this function. We should > probably also look at modifying the function to use something more > elaborated than a simple round robin scheme to ensure doorbells are > equally distribuited on cachelines, but that can probably wait until we > use more doorbells. Renamed. > > /* > > * Borrow the first client to set up & tear down each unused doorbell > > * in turn, to ensure that all doorbell h/w is (re)initialised. > > */ > > -static void guc_init_doorbell_hw(struct intel_guc *guc) > > +static int guc_init_doorbell_hw(struct intel_guc *guc) > > { > > struct i915_guc_client *client = guc->execbuf_client; > > - uint16_t db_id; > > - int i, err; > > + int err; > > + int i; > > > > - guc_disable_doorbell(guc, client); > > + if (has_doorbell(client)) > > + destroy_doorbell(client); > > > > - for (i = 0; i < GUC_MAX_DOORBELLS; ++i) { > > - /* Skip if doorbell is OK */ > > - if (guc_doorbell_check(guc, i)) > > + for (i = 0; i < GUC_NUM_DOORBELLS; ++i) { > > + if (doorbell_ok(guc, i)) > > continue; > > > > As mentioned in the previous patch version, here we assume that all the > doorbells should be disabled an we want to reset HW that has been left > enabled from previous users, so the doorbell_bitmap should be clear. > Maybe adding a simple > > GEM_BUG_ON(test_bit(i, guc->doorbell_bitmap)); > > will help making sure that we're not leaving bits set when > releasing/resetting. The functions don't actually even touch doorbell_bitmap, they just nuke all active clients. That's what the previous code did, as I read it. > > - err = guc_update_doorbell_id(guc, client, i); > > - if (err) > > - DRM_DEBUG_DRIVER("Doorbell %d update failed, err %d\n", > > - i, err); > > + err = __reset_doorbell(client, i); > > If the reset fails the doorbell is in a bad state, so it might be worth > to ensure that the bit is set in the bitmask to make sure we don't > assign that doorbell to any client, but we'd have to make sure to clear > the bitmask on GuC reset (see comment above). Clearing the bits now in guc_init_doorbell_hw(). > > + WARN(err, "Doorbell %d reset failed, err %d\n", i, err); > > } > > > > - db_id = select_doorbell_register(guc, client->priority); > > - WARN_ON(db_id == GUC_INVALID_DOORBELL_ID); > > - > > - err = guc_update_doorbell_id(guc, client, db_id); > > + err = __reserve_doorbell(client); > > if (err) > > - DRM_WARN("Failed to restore doorbell to %d, err %d\n", > > - db_id, err); > > + return err; > > Continuing the discussion from the previous patch revision: > > <snip> > > Shouldn't this be create_doorbell() instead of reserve? > > That would end up calling the __update_doorbell, which we can't as the > desc is for some strange reason not mapped yet. > </snip> > > As far as I can see, everything should already be allocated and mapped > here. Aren't we anyway already calling __update_doorbell both in the > client_alloc and in __reset_doorbell above? > Also, where are we re-creating the doorbell that we destroyed at the > beginning of the function? > Hmm, refactored the extra destroy cycle out. <SNIP> > > @@ -753,27 +781,35 @@ guc_client_alloc(struct drm_i915_private *dev_priv, > > guc_proc_desc_init(guc, client); > > guc_ctx_desc_init(guc, client); > > > > - /* For runtime client allocation we need to enable the doorbell. Not > > - * required yet for the static execbuf_client as this special kernel > > - * client is enabled from i915_guc_submission_enable(). > > - * > > - * guc_update_doorbell_id(guc, client, db_id); > > - */ > > + /* For runtime client allocation we need to enable the doorbell. */ > > this comment is a bit unclear now because you're not deferring the > initialization of execbuf_client to guc_init_doorbell_hw anymore, so > there is no difference between execbuf_client and "runtime" clients. > Maybe we can just remove the comment. Removed. > Note that creating the doorbell here will cause it to be destroyed and > re-allocated when i915_guc_submission_enable is called. A worth > compromise IMO to avoid special cases, but worth pointing out :) Refactored a bit. Regards, Joonas -- Joonas Lahtinen Open Source Technology Center Intel Corporation _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx