Re: [RFC PATCH v2 10/17] WIP: gpu: host1x: Add no-recovery mode

Mikko Perttunen <cyndis@xxxxxxxx> · Sat, 12 Sep 2020 16:31:32 +0300

On 9/12/20 3:53 PM, Dmitry Osipenko wrote:
12.09.2020 01:11, Mikko Perttunen пишет:
On 9/11/20 7:40 PM, Dmitry Osipenko wrote:
05.09.2020 13:34, Mikko Perttunen пишет:
+    } else {
+        struct host1x_job *failed_job = job;
+
+        host1x_job_dump(dev, job);
+
+        host1x_syncpt_set_locked(job->syncpt);
+        failed_job->cancelled = true;
+
+        list_for_each_entry_continue(job, &cdma->sync_queue, list) {
+            unsigned int i;
+
+            if (job->syncpt != failed_job->syncpt)
+                continue;
+
+            for (i = 0; i < job->num_slots; i++) {
+                unsigned int slot = (job->first_get/8 + i) %
+                            HOST1X_PUSHBUFFER_SLOTS;
+                u32 *mapped = cdma->push_buffer.mapped;
+
+                mapped[2*slot+0] = 0x1bad0000;
+                mapped[2*slot+1] = 0x1bad0000;

The 0x1bad0000 is a valid memory address on Tegra20.

The 0x60000000 is invalid phys address for all hardware generations.
It's used by grate-kernel [1] and VDE driver [2]. Note that the 0x6 <<
28 is also invalid Host1x opcode, while 0x1 should break CDMA parser
during of PB debug-dumping.

[1]
https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/gem.h#L16

[2]
https://elixir.bootlin.com/linux/v5.9-rc4/source/drivers/staging/media/tegra-vde/iommu.c#L99

The VDE driver reserves the trapping IOVA addresses, I assume the Host1x
driver should do the same.

The 0x1bad0000's are not intended to be memory addresses, they are NOOP
opcodes (INCR of 0 words to offset 0xbad). I'll fix this to use proper
functions to construct the opcodes and add some comments. These need to
be NOOP opcodes so the command parser skips over these "cancelled" jobs
when the channel is resumed.

BTW, 0x60000000 is valid on Tegra194 and later.

At a quick glance it looked like a memory address :)

It does look a bit like one :) I'll add a comment to make it clear.

I'm now taking a closer look at this patch and it raises some more
questions, like for example by looking at the "On job timeout, we stop
the channel, NOP all future jobs on the channel using the same syncpoint
..." through the prism of grate-kernel experience, I'm not sure how it
could co-exist with the drm-scheduler and why it's needed at all. But I
think we need a feature-complete version (at least a rough version), so
that we could start the testing, and then it should be easier to review
and discuss such things.

The reason this is needed is that if a job times out and we don't do its 
syncpoint increments on the CPU, then a successive job incrementing that 
same syncpoint would cause fences to signal incorrectly. The job that 
was supposed to signal those fences didn't actually run; and any data 
those fences were protecting would still be garbage.
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel