Re: [RFC PATCH v2 10/17] WIP: gpu: host1x: Add no-recovery mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/12/20 3:53 PM, Dmitry Osipenko wrote:
12.09.2020 01:11, Mikko Perttunen пишет:
On 9/11/20 7:40 PM, Dmitry Osipenko wrote:
05.09.2020 13:34, Mikko Perttunen пишет:
+    } else {
+        struct host1x_job *failed_job = job;
+
+        host1x_job_dump(dev, job);
+
+        host1x_syncpt_set_locked(job->syncpt);
+        failed_job->cancelled = true;
+
+        list_for_each_entry_continue(job, &cdma->sync_queue, list) {
+            unsigned int i;
+
+            if (job->syncpt != failed_job->syncpt)
+                continue;
+
+            for (i = 0; i < job->num_slots; i++) {
+                unsigned int slot = (job->first_get/8 + i) %
+                            HOST1X_PUSHBUFFER_SLOTS;
+                u32 *mapped = cdma->push_buffer.mapped;
+
+                mapped[2*slot+0] = 0x1bad0000;
+                mapped[2*slot+1] = 0x1bad0000;

The 0x1bad0000 is a valid memory address on Tegra20.

The 0x60000000 is invalid phys address for all hardware generations.
It's used by grate-kernel [1] and VDE driver [2]. Note that the 0x6 <<
28 is also invalid Host1x opcode, while 0x1 should break CDMA parser
during of PB debug-dumping.

[1]
https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/gem.h#L16


[2]
https://elixir.bootlin.com/linux/v5.9-rc4/source/drivers/staging/media/tegra-vde/iommu.c#L99


The VDE driver reserves the trapping IOVA addresses, I assume the Host1x
driver should do the same.


The 0x1bad0000's are not intended to be memory addresses, they are NOOP
opcodes (INCR of 0 words to offset 0xbad). I'll fix this to use proper
functions to construct the opcodes and add some comments. These need to
be NOOP opcodes so the command parser skips over these "cancelled" jobs
when the channel is resumed.

BTW, 0x60000000 is valid on Tegra194 and later.

At a quick glance it looked like a memory address :)

It does look a bit like one :) I'll add a comment to make it clear.


I'm now taking a closer look at this patch and it raises some more
questions, like for example by looking at the "On job timeout, we stop
the channel, NOP all future jobs on the channel using the same syncpoint
..." through the prism of grate-kernel experience, I'm not sure how it
could co-exist with the drm-scheduler and why it's needed at all. But I
think we need a feature-complete version (at least a rough version), so
that we could start the testing, and then it should be easier to review
and discuss such things.


The reason this is needed is that if a job times out and we don't do its syncpoint increments on the CPU, then a successive job incrementing that same syncpoint would cause fences to signal incorrectly. The job that was supposed to signal those fences didn't actually run; and any data those fences were protecting would still be garbage.



[Index of Archives]     [ARM Kernel]     [Linux ARM]     [Linux ARM MSM]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux