On 22/07/14 10:40, Daniel Vetter wrote:
On Tue, Jul 22, 2014 at 09:28:51AM +0200, Daniel Vetter wrote:
On Mon, Jul 21, 2014 at 03:03:07PM -0400, Jerome Glisse wrote:
On Mon, Jul 21, 2014 at 09:41:29PM +0300, Oded Gabbay wrote:
On 21/07/14 21:22, Daniel Vetter wrote:
On Mon, Jul 21, 2014 at 7:28 PM, Oded Gabbay <oded.gabbay@xxxxxxx> wrote:
I'm not sure whether we can do the same trick with the hw scheduler. But
then unpinning hw contexts will drain the pipeline anyway, so I guess we
can just stop feeding the hw scheduler until it runs dry. And then unpin
and evict.
So, I'm afraid but we can't do this for AMD Kaveri because:
Well as long as you can drain the hw scheduler queue (and you can do
that, worst case you have to unmap all the doorbells and other stuff
to intercept further submission from userspace) you can evict stuff.
I can't drain the hw scheduler queue, as I can't do mid-wave preemption.
Moreover, if I use the dequeue request register to preempt a queue
during a dispatch it may be that some waves (wave groups actually) of
the dispatch have not yet been created, and when I reactivate the mqd,
they should be created but are not. However, this works fine if you use
the HIQ. the CP ucode correctly saves and restores the state of an
outstanding dispatch. I don't think we have access to the state from
software at all, so it's not a bug, it is "as designed".
I think here Daniel is suggesting to unmapp the doorbell page, and track
each write made by userspace to it and while unmapped wait for the gpu to
drain or use some kind of fence on a special queue. Once GPU is drain we
can move pinned buffer, then remap the doorbell and update it to the last
value written by userspace which will resume execution to the next job.
Exactly, just prevent userspace from submitting more. And if you have
misbehaving userspace that submits too much, reset the gpu and tell it
that you're sorry but won't schedule any more work.
We have this already in i915 (since like all other gpus we're not
preempting right now) and it works. There's some code floating around to
even restrict the reset to _just_ the offending submission context, with
nothing else getting corrupted.
You can do all this with the doorbells and unmapping them, but it's a
pain. Much easier if you have a real ioctl, and I haven't seen anyone with
perf data indicating that an ioctl would be too much overhead on linux.
Neither in this thread nor internally here at intel.
Aside: Another reason why the ioctl is better than the doorbell is
integration with other drivers. Yeah I know this is about compute, but
sooner or later someone will want to e.g. post-proc video frames between
the v4l capture device and the gpu mpeg encoder. Or something else fancy.
Then you want to be able to somehow integrate into a cross-driver fence
framework like android syncpts, and you can't do that without an ioctl for
the compute submissions.
-Daniel
I assume you talk about interop between graphics and compute. For that, we have
a module that is now being tested, and indeed uses an ioctl to map a graphic
object to compute process address space. However, after the translation is done,
the work is done only in userspace.
Oded
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>