Re: Introduce a new helper framework for buffer synchronization

Rob Clark <robdclark@xxxxxxxxx> · Mon, 13 May 2013 13:58:10 -0400

On Mon, May 13, 2013 at 1:18 PM, Inki Dae <inki.dae@xxxxxxxxxxx> wrote:
>
>
> 2013/5/13 Rob Clark <robdclark@xxxxxxxxx>
>>
>> On Mon, May 13, 2013 at 8:21 AM, Inki Dae <inki.dae@xxxxxxxxxxx> wrote:
>> >
>> >> In that case you still wouldn't give userspace control over the fences.
>> >> I
>> >> don't see any way that can end well.
>> >> What if userspace never signals? What if userspace gets killed by oom
>> >> killer. Who keeps track of that?
>> >>
>> >
>> > In all cases, all kernel resources to user fence will be released by
>> > kernel
>> > once the fence is timed out: never signaling and process killing by oom
>> > killer makes the fence timed out. And if we use mmap mechanism you
>> > mentioned
>> > before, I think user resource could also be freed properly.
>>
>>
>> I tend to agree w/ Maarten here.. there is no good reason for
>> userspace to be *signaling* fences.  The exception might be some blob
>> gpu drivers which don't have enough knowledge in the kernel to figure
>> out what to do.  (In which case you can add driver private ioctls for
>> that.. still not the right thing to do but at least you don't make a
>> public API out of it.)
>>
>
> Please do not care whether those are generic or not. Let's see the following
> three things. First, it's cache operation. As you know, ARM SoC has ACP
> (Accelerator Coherency Port) and can be connected to DMA engine or similar
> devices. And this port is used for cache coherency between CPU cache and DMA
> device. However, most devices on ARM based embedded systems don't use the
> ACP port. So they need proper cache operation before and after of DMA or CPU
> access in case of using cachable mapping. Actually, I see many Linux based
> platforms call cache control interfaces directly for that. I think the
> reason, they do so, is that kernel isn't aware of when and how CPU accessed
> memory.

I think we had kicked around the idea of giving dmabuf's a
prepare/finish ioctl quite some time back.  This is probably something
that should be at least a bit decoupled from fences.  (Possibly
'prepare' waits for dma access to complete, but not the other way
around.)

And I did implement in omapdrm support for simulating coherency via
page fault-in / shoot-down..  It is one option that makes it
completely transparent to userspace, although there is some
performance const, so I suppose it depends a bit on your use-case.

> And second, user process has to do so many things in case of using shared
> memory with DMA device. User process should understand how DMA device is
> operated and when interfaces for controling the DMA device are called. Such
> things would make user application so complicated.
>
> And third, it's performance optimization to multimedia and graphics devices.
> As I mentioned already, we should consider sequential processing for buffer
> sharing between CPU and DMA device. This means that CPU should stay with
> idle until DMA device is completed and vise versa.
>
> That is why I proposed such user interfaces. Of course, these interfaces
> might be so ugly yet: for this, Maarten pointed already out and I agree with
> him. But there must be another better way. Aren't you think we need similar
> thing? With such interfaces, cache control and buffer synchronization can be
> performed in kernel level. Moreover, user applization doesn't need to
> consider DMA device controlling anymore. Therefore, one thread can access a
> shared buffer and the other can control DMA device with the shared buffer in
> parallel. We can really make the best use of CPU and DMA idle time. In other
> words, we can really make the best use of multi tasking OS, Linux.
>
> So could you please tell me about that there is any reason we don't use
> public API for it? I think we can add and use public API if NECESSARY.

well, for cache management, I think it is a better idea.. I didn't
really catch that this was the motivation from the initial patch, but
maybe I read it too quickly.  But cache can be decoupled from
synchronization, because CPU access is not asynchronous.  For
userspace/CPU access to buffer, you should:

  1) wait for buffer
  2) prepare-access
  3)  ... do whatever cpu access to buffer ...
  4) finish-access
  5) submit buffer for new dma-operation

I suppose you could combine the syscall for #1 and #2.. not sure if
that is a good idea or not.  But you don't need to.  And there is
never really any need for userspace to signal a fence.

BR,
-R

> Thanks,
> Inki Dae
>
>>
>> BR,
>> -R
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@xxxxxxxxxxxxxxxxxxxxx
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel