On 9/22/2022 07:26, Dan Carpenter wrote:
Hello Matthew Brost,
The patch 6b540bf6f143: "drm/i915/guc: Implement multi-lrc
submission" from Oct 14, 2021, leads to the following Smatch static
checker warning:
drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:752 __guc_add_request()
warn: refcount leak 'ce->ref.refcount.refs.counter': lines='752'
drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
672 static int __guc_add_request(struct intel_guc *guc, struct i915_request *rq)
673 {
674 int err = 0;
675 struct intel_context *ce = request_to_scheduling_context(rq);
676 u32 action[3];
677 int len = 0;
678 u32 g2h_len_dw = 0;
679 bool enabled;
680
681 lockdep_assert_held(&rq->engine->sched_engine->lock);
682
683 /*
684 * Corner case where requests were sitting in the priority list or a
685 * request resubmitted after the context was banned.
686 */
687 if (unlikely(intel_context_is_banned(ce))) {
688 i915_request_put(i915_request_mark_eio(rq));
689 intel_engine_signal_breadcrumbs(ce->engine);
690 return 0;
691 }
692
693 GEM_BUG_ON(!atomic_read(&ce->guc_id.ref));
694 GEM_BUG_ON(context_guc_id_invalid(ce));
695
696 if (context_policy_required(ce)) {
697 err = guc_context_policy_init_v70(ce, false);
698 if (err)
699 return err;
700 }
701
702 spin_lock(&ce->guc_state.lock);
703
704 /*
705 * The request / context will be run on the hardware when scheduling
706 * gets enabled in the unblock. For multi-lrc we still submit the
707 * context to move the LRC tails.
708 */
709 if (unlikely(context_blocked(ce) && !intel_context_is_parent(ce)))
710 goto out;
711
712 enabled = context_enabled(ce) || context_blocked(ce);
713
714 if (!enabled) {
715 action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
716 action[len++] = ce->guc_id.id;
717 action[len++] = GUC_CONTEXT_ENABLE;
718 set_context_pending_enable(ce);
719 intel_context_get(ce);
720 g2h_len_dw = G2H_LEN_DW_SCHED_CONTEXT_MODE_SET;
721 } else {
722 action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
723 action[len++] = ce->guc_id.id;
724 }
725
726 err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
727 if (!enabled && !err) {
728 trace_intel_context_sched_enable(ce);
729 atomic_inc(&guc->outstanding_submission_g2h);
730 set_context_enabled(ce);
731
732 /*
733 * Without multi-lrc KMD does the submission step (moving the
734 * lrc tail) so enabling scheduling is sufficient to submit the
735 * context. This isn't the case in multi-lrc submission as the
736 * GuC needs to move the tails, hence the need for another H2G
737 * to submit a multi-lrc context after enabling scheduling.
738 */
739 if (intel_context_is_parent(ce)) {
740 action[0] = INTEL_GUC_ACTION_SCHED_CONTEXT;
741 err = intel_guc_send_nb(guc, action, len - 1, 0);
Should this have a something like:
if (err)
intel_context_put(ce);
No.
The ce has been marked as enabled above because the actual enable call
succeeded. This is a secondary call to 'poke' the context. If this
fails, the context might not get scheduled in a timely manner but the
tracking state is still correct. The context is now in use by GuC and
therefore must be reference locked. And the 'context_enabled' flag is
set on the i915 side to note that the reference count is held.
If you want to unwind at this point, you would need to send a
CONTEXT_MODE_SET(DISABLE) H2G message (amongst other things) and only
once that call has completed, can you call intel_context_put(ce).
I don't get why the analyser would be claiming a leak here. I'm not sure
if it is just that the analyser is confused or if there is some other
potential route to a leak. Is it possible to get more details as to how
it thinks the leak can occur?
John.
742 }
743 } else if (!enabled) {
744 clr_context_pending_enable(ce);
745 intel_context_put(ce);
746 }
747 if (likely(!err))
748 trace_i915_request_guc_submit(rq);
749
750 out:
751 spin_unlock(&ce->guc_state.lock);
--> 752 return err;
753 }
regards,
dan carpenter