Re: [PATCH 2/2] drm/v3d: Appease lockdep while updating GPU stats

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Tvrtko,

On 8/12/24 06:12, Tvrtko Ursulin wrote:
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxx>

Lockdep thinks our seqcount_t usage is unsafe because the update path can
be both from irq and worker context:

  [ ] ================================
  [ ] WARNING: inconsistent lock state
  [ ] 6.10.3-v8-16k-numa #159 Tainted: G        WC
  [ ] --------------------------------
  [ ] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
  [ ] swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
  [ ] ffff80003d7c08d0 (&v3d_priv->stats[i].lock){?.+.}-{0:0}, at: v3d_irq+0xc8/0x660 [v3d]
  [ ] {HARDIRQ-ON-W} state was registered at:
  [ ]   lock_acquire+0x1f8/0x328
  [ ]   v3d_job_start_stats.isra.0+0xd8/0x218 [v3d]
  [ ]   v3d_bin_job_run+0x23c/0x388 [v3d]
  [ ]   drm_sched_run_job_work+0x520/0x6d0 [gpu_sched]
  [ ]   process_one_work+0x62c/0xb48
  [ ]   worker_thread+0x468/0x5b0
  [ ]   kthread+0x1c4/0x1e0
  [ ]   ret_from_fork+0x10/0x20
  [ ] irq event stamp: 337094
  [ ] hardirqs last  enabled at (337093): [<ffffc0008144ce7c>] default_idle_call+0x11c/0x140
  [ ] hardirqs last disabled at (337094): [<ffffc0008144a354>] el1_interrupt+0x24/0x58
  [ ] softirqs last  enabled at (337082): [<ffffc00080061d90>] handle_softirqs+0x4e0/0x538
  [ ] softirqs last disabled at (337073): [<ffffc00080010364>] __do_softirq+0x1c/0x28
  [ ]
                 other info that might help us debug this:
  [ ]  Possible unsafe locking scenario:

  [ ]        CPU0
  [ ]        ----
  [ ]   lock(&v3d_priv->stats[i].lock);
  [ ]   <Interrupt>
  [ ]     lock(&v3d_priv->stats[i].lock);
  [ ]
                 *** DEADLOCK ***

  [ ] no locks held by swapper/0/0.
  [ ]
                stack backtrace:
  [ ] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        WC         6.10.3-v8-16k-numa #159
  [ ] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
  [ ] Call trace:
  [ ]  dump_backtrace+0x170/0x1b8
  [ ]  show_stack+0x20/0x38
  [ ]  dump_stack_lvl+0xb4/0xd0
  [ ]  dump_stack+0x18/0x28
  [ ]  print_usage_bug+0x3cc/0x3f0
  [ ]  mark_lock+0x4d0/0x968
  [ ]  __lock_acquire+0x784/0x18c8
  [ ]  lock_acquire+0x1f8/0x328
  [ ]  v3d_job_update_stats+0xec/0x2e0 [v3d]
  [ ]  v3d_irq+0xc8/0x660 [v3d]
  [ ]  __handle_irq_event_percpu+0x1f8/0x488
  [ ]  handle_irq_event+0x88/0x128
  [ ]  handle_fasteoi_irq+0x298/0x408
  [ ]  generic_handle_domain_irq+0x50/0x78

But it is a false positive because all the queue-stats pairs have their
own lock and jobs are also one at a time.

Nevertheless we can appease lockdep by doing two things:

1. Split the locks into two classes:

Because only GPU jobs have the irq context update section, this means no
further changes are required for the CPU based queues.

2. Disable local interrupts in the GPU stats update portions:

This appeases lockdep that all GPU job update sides consistently run with
interrupts disabled. Again, it isn't a real locking issue that this fixes
but it avoids an alarming false positive lockdep splat.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxx>
Fixes: 6abe93b621ab ("drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handler")
Cc: Maíra Canal <mcanal@xxxxxxxxxx>
---
Splitting into two own lock classes is perhaps too complicated and instead
we could use the new v3d_job_start_stats_irqsave() helper from the CPU
jobs too. I *think* that would fix the false positive too.

The naming of v3d_job_start_stats_irqsave() is also a bit dodgy (given
that the irqsave part depends on lockdep), but I have no better ideas at
the moment.

Having said this.. Perhaps simply the #ifdef dance with a comment to
existing v3d_job_start_stats() would be better? It would be a much shorter
and a very localised patch with perhaps no new downsides.

TBH I don't really like the idea of creating two lock classes only to
fix a false positive in lockdep. The code just got too complex for a
feature that just exists for tracking stats. Moreover, it is a false
positive.

If possible, I'd try any of the two other options you suggested here.
They feel more digestible for a false positive fix IMHO.

Best Regards,
- Maíra

---
  drivers/gpu/drm/v3d/v3d_drv.c   | 16 +++++++++++++++-
  drivers/gpu/drm/v3d/v3d_gem.c   | 15 ++++++++++++++-
  drivers/gpu/drm/v3d/v3d_sched.c | 29 +++++++++++++++++++++++++----
  3 files changed, 54 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index d7ff1f5fa481..4a5d3ab1281b 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -106,6 +106,8 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void *data,
  static int
  v3d_open(struct drm_device *dev, struct drm_file *file)
  {
+	static struct lock_class_key v3d_stats_gpu_lock_class;
+	static struct lock_class_key v3d_stats_cpu_lock_class;
  	struct v3d_dev *v3d = to_v3d_dev(dev);
  	struct v3d_file_priv *v3d_priv;
  	struct drm_gpu_scheduler *sched;
@@ -118,13 +120,25 @@ v3d_open(struct drm_device *dev, struct drm_file *file)
  	v3d_priv->v3d = v3d;
for (i = 0; i < V3D_MAX_QUEUES; i++) {
+		struct lock_class_key *key;
+		char *name;
+
  		sched = &v3d->queue[i].sched;
  		drm_sched_entity_init(&v3d_priv->sched_entity[i],
  				      DRM_SCHED_PRIORITY_NORMAL, &sched,
  				      1, NULL);
memset(&v3d_priv->stats[i], 0, sizeof(v3d_priv->stats[i]));
-		seqcount_init(&v3d_priv->stats[i].lock);
+
+		if (i == V3D_CACHE_CLEAN || i == V3D_CPU) {
+			key = &v3d_stats_cpu_lock_class;
+			name = "v3d_client_stats_cpu_lock";
+		} else {
+			key = &v3d_stats_gpu_lock_class;
+			name = "v3d_client_stats_gpu_lock";
+		}
+
+		__seqcount_init(&v3d_priv->stats[i].lock, name, key);
  	}
v3d_perfmon_open_file(v3d_priv);
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index da8faf3b9011..3567a80e603d 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -242,16 +242,29 @@ v3d_invalidate_caches(struct v3d_dev *v3d)
  int
  v3d_gem_init(struct drm_device *dev)
  {
+	static struct lock_class_key v3d_stats_gpu_lock_class;
+	static struct lock_class_key v3d_stats_cpu_lock_class;
  	struct v3d_dev *v3d = to_v3d_dev(dev);
  	u32 pt_size = 4096 * 1024;
  	int ret, i;
for (i = 0; i < V3D_MAX_QUEUES; i++) {
  		struct v3d_queue_state *queue = &v3d->queue[i];
+		struct lock_class_key *key;
+		char *name;
queue->fence_context = dma_fence_context_alloc(1);
  		memset(&queue->stats, 0, sizeof(queue->stats));
-		seqcount_init(&queue->stats.lock);
+
+		if (i == V3D_CACHE_CLEAN || i == V3D_CPU) {
+			key = &v3d_stats_cpu_lock_class;
+			name = "v3d_stats_cpu_lock";
+		} else {
+			key = &v3d_stats_gpu_lock_class;
+			name = "v3d_stats_gpu_lock";
+		}
+
+		__seqcount_init(&queue->stats.lock, name, key);
  	}
spin_lock_init(&v3d->mm_lock);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index cc2e5a89467b..b2540e20d30c 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -149,6 +149,27 @@ v3d_job_start_stats(struct v3d_job *job, enum v3d_queue queue)
  	preempt_enable();
  }
+/*
+ * We only need to disable local interrupts to appease lockdep who otherwise
+ * would think v3d_job_start_stats vs v3d_stats_update has an unsafe in-irq vs
+ * no-irq-off usage problem. This is a false positive becuase all the locks are
+ * per queue and stats type, and all jobs are completely one at a time
+ * serialised.
+ */
+static void
+v3d_job_start_stats_irqsave(struct v3d_job *job, enum v3d_queue queue)
+{
+#ifdef CONFIG_LOCKDEP
+	unsigned long flags;
+
+	local_irq_save(flags);
+#endif
+	v3d_job_start_stats(job, queue);
+#ifdef CONFIG_LOCKDEP
+	local_irq_restore(flags);
+#endif
+}
+
  static void
  v3d_stats_update(struct v3d_stats *stats, u64 now)
  {
@@ -194,6 +215,7 @@ static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
  	 * reuse the overflow attached to a previous job.
  	 */
  	V3D_CORE_WRITE(0, V3D_PTB_BPOS, 0);
+	v3d_job_start_stats(&job->base, V3D_BIN); /* Piggy-back on existing irq-off section. */
  	spin_unlock_irqrestore(&v3d->job_lock, irqflags);
v3d_invalidate_caches(v3d);
@@ -209,7 +231,6 @@ static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
  	trace_v3d_submit_cl(dev, false, to_v3d_fence(fence)->seqno,
  			    job->start, job->end);
- v3d_job_start_stats(&job->base, V3D_BIN);
  	v3d_switch_perfmon(v3d, &job->base);
/* Set the current and end address of the control list.
@@ -261,7 +282,7 @@ static struct dma_fence *v3d_render_job_run(struct drm_sched_job *sched_job)
  	trace_v3d_submit_cl(dev, true, to_v3d_fence(fence)->seqno,
  			    job->start, job->end);
- v3d_job_start_stats(&job->base, V3D_RENDER);
+	v3d_job_start_stats_irqsave(&job->base, V3D_RENDER);
  	v3d_switch_perfmon(v3d, &job->base);
/* XXX: Set the QCFG */
@@ -294,7 +315,7 @@ v3d_tfu_job_run(struct drm_sched_job *sched_job)
trace_v3d_submit_tfu(dev, to_v3d_fence(fence)->seqno); - v3d_job_start_stats(&job->base, V3D_TFU);
+	v3d_job_start_stats_irqsave(&job->base, V3D_TFU);
V3D_WRITE(V3D_TFU_IIA(v3d->ver), job->args.iia);
  	V3D_WRITE(V3D_TFU_IIS(v3d->ver), job->args.iis);
@@ -339,7 +360,7 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
trace_v3d_submit_csd(dev, to_v3d_fence(fence)->seqno); - v3d_job_start_stats(&job->base, V3D_CSD);
+	v3d_job_start_stats_irqsave(&job->base, V3D_CSD);
  	v3d_switch_perfmon(v3d, &job->base);
csd_cfg0_reg = V3D_CSD_QUEUED_CFG0(v3d->ver);



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux