On Thu, Sep 22, 2022 at 01:01:48PM -0700, John Harrison wrote: > On 9/22/2022 07:26, Dan Carpenter wrote: > > Hello Matthew Brost, > > > > The patch 6b540bf6f143: "drm/i915/guc: Implement multi-lrc > > submission" from Oct 14, 2021, leads to the following Smatch static > > checker warning: > > > > drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:752 __guc_add_request() > > warn: refcount leak 'ce->ref.refcount.refs.counter': lines='752' > > > > drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > 672 static int __guc_add_request(struct intel_guc *guc, struct i915_request *rq) > > 673 { > > 674 int err = 0; > > 675 struct intel_context *ce = request_to_scheduling_context(rq); > > 676 u32 action[3]; > > 677 int len = 0; > > 678 u32 g2h_len_dw = 0; > > 679 bool enabled; > > 680 > > 681 lockdep_assert_held(&rq->engine->sched_engine->lock); > > 682 > > 683 /* > > 684 * Corner case where requests were sitting in the priority list or a > > 685 * request resubmitted after the context was banned. > > 686 */ > > 687 if (unlikely(intel_context_is_banned(ce))) { > > 688 i915_request_put(i915_request_mark_eio(rq)); > > 689 intel_engine_signal_breadcrumbs(ce->engine); > > 690 return 0; > > 691 } > > 692 > > 693 GEM_BUG_ON(!atomic_read(&ce->guc_id.ref)); > > 694 GEM_BUG_ON(context_guc_id_invalid(ce)); > > 695 > > 696 if (context_policy_required(ce)) { > > 697 err = guc_context_policy_init_v70(ce, false); > > 698 if (err) > > 699 return err; > > 700 } > > 701 > > 702 spin_lock(&ce->guc_state.lock); > > 703 > > 704 /* > > 705 * The request / context will be run on the hardware when scheduling > > 706 * gets enabled in the unblock. For multi-lrc we still submit the > > 707 * context to move the LRC tails. > > 708 */ > > 709 if (unlikely(context_blocked(ce) && !intel_context_is_parent(ce))) > > 710 goto out; > > 711 > > 712 enabled = context_enabled(ce) || context_blocked(ce); > > 713 > > 714 if (!enabled) { > > 715 action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET; > > 716 action[len++] = ce->guc_id.id; > > 717 action[len++] = GUC_CONTEXT_ENABLE; > > 718 set_context_pending_enable(ce); > > 719 intel_context_get(ce); > > 720 g2h_len_dw = G2H_LEN_DW_SCHED_CONTEXT_MODE_SET; > > 721 } else { > > 722 action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT; > > 723 action[len++] = ce->guc_id.id; > > 724 } > > 725 > > 726 err = intel_guc_send_nb(guc, action, len, g2h_len_dw); > > 727 if (!enabled && !err) { > > 728 trace_intel_context_sched_enable(ce); > > 729 atomic_inc(&guc->outstanding_submission_g2h); > > 730 set_context_enabled(ce); > > 731 > > 732 /* > > 733 * Without multi-lrc KMD does the submission step (moving the > > 734 * lrc tail) so enabling scheduling is sufficient to submit the > > 735 * context. This isn't the case in multi-lrc submission as the > > 736 * GuC needs to move the tails, hence the need for another H2G > > 737 * to submit a multi-lrc context after enabling scheduling. > > 738 */ > > 739 if (intel_context_is_parent(ce)) { > > 740 action[0] = INTEL_GUC_ACTION_SCHED_CONTEXT; > > 741 err = intel_guc_send_nb(guc, action, len - 1, 0); > > > > Should this have a something like: > > > > if (err) > > intel_context_put(ce); > No. > > The ce has been marked as enabled above because the actual enable call > succeeded.? This is a secondary call to 'poke' the context. If this fails, > the context might not get scheduled in a timely manner but the tracking > state is still correct. The context is now in use by GuC and therefore must > be reference locked. And the 'context_enabled' flag is set on the i915 side > to note that the reference count is held. > > If you want to unwind at this point, you would need to send a > CONTEXT_MODE_SET(DISABLE) H2G message (amongst other things) and only once > that call has completed, can you call intel_context_put(ce). > > I don't get why the analyser would be claiming a leak here. I'm not sure if > it is just that the analyser is confused or if there is some other potential > route to a leak. Is it possible to get more details as to how it thinks the > leak can occur? Thanks, this helps me a lot! This is a Smatch static analysis thing I'm working on. It simple enough to silence the false positive. I add this line which says that after set_context_enabled() then ignore the reference counting. add_function_param_key_hook("set_context_enabled", &match_ignore, 0, "$->ref.refcount.refs.counter", NULL); The heuristic that the check is using is that if some error paths drop the refcount then all error paths should do it. So adding the if statement I suggested would silence the warning (but introduce a bug in the kernel). regards, dan carpenter