Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Felipe,

On Fri, May 14, 2010 at 10:27 PM, Felipe Contreras
<felipe.contreras@xxxxxxxxx> wrote:
> Hi,
>
> I spent some time looking deeper into this patch series, and I have some doubts.
>
> On Sun, May 2, 2010 at 8:47 PM, Ohad Ben-Cohen <ohad@xxxxxxxxxx> wrote:
>> On Sun, May 2, 2010 at 4:17 PM, Felipe Contreras
>> <felipe.contreras@xxxxxxxxx> wrote:
>>> On Sat, May 1, 2010 at 11:44 PM, Ohad Ben-Cohen <ohad@xxxxxxxxxx> wrote:
>>>>   The patchset renames the flush ioctl to begin_dma_to_dsp and
>>>>   the invalidate ioctl to begin_dma_from_dsp. Both functions
>>>>   eventually call dma_map_sg, with the former requesting a
>>>>   DMA_BIDIRECTIONAL direction, and the latter requesting a
>>>>   DMA_FROM_DEVICE direction.
>>>>   In addition, the patchset adds two new APIs which calls dma_unmap_sg:
>>>>   end_dma_to_dsp and end_dma_from_dsp.
>>>>
>>>>   Ideally, there would be only a single begin_dma command and a single
>>>>   end_dma one, which would accept an additional parameter that will
>>>>   determine the direction of the transfer. Such an approach would be more
>>>>   versatile and cleaner, but it would also break all user space apps that
>>>>   use dspbridge today.
>>>
>>> If I understand correctly all user-space apps would be broken anyway
>>> because they are not issuing the end_dma calls. At the very least they
>>> need to be updated to use them.
>>
>> Basically you're right, but the patches currently silently allow
>> today's user space
>> to somehow work (because if you don't use dma bounce buffers and you feel lucky
>> about speculative prefetching then you might get away with not calling
>> dma unmap).
>> I did that on purpose, to ease testing & migration, but didn't
>> explicitely said it because
>>  frankly it's just wrong.
>
> I looked into the dma code (I guess it's in arch/arm/mm/dma-mapping.c)
> and I don't understand what dma_unmap_sg is supposed to do.

Portable code must use the dma_unmap_* APIs.

As you mentioned, one of the things it's crucial for
(but not used in our case) is bounce buffers:
e.g. when doing a DMA from a device using a buffer
that is out of the platform's DMA reach,
the DMA API uses bounce buffers: data DMA'ed to that
bounce buffer, and then the dma unmap copies it back
to the original buffer).

Another thing unmap is taking care of is speculative prefetch:
modern ARM processors can access the buffer during DMA
in order to speculatively prefetch data into the cache. Naturally
this can seriously break a DMA from a device, so unmap is
invalidating the caches to prevent this.

The current dspbridge cache API is mostly relying on the
application to do the right thing: application programmer should
decide when to clean, flush or invalidate the caches in order
to have a successful DMA operation.

This is prone to mistakes, it's not portable, and of course
not upstreamable.

Using the standard DMA API we take the freedom from the
application programmer - we just ask him/her to tell us before
a DMA operation begins, and after it ends. The DMA API
is then responsible to do the right thing.

> first some "safe buffer" is searched, and if there isn't any... then
> it depends on the direction whether something is actually done or not.
>
> I guess it depends whether our arch has dmabounce or not, which I have
> no idea, but if we do, wouldn't skiping dma_unmap calls leak some
> massive amount of "safe buffers"?
>
>>> Also, in Nokia we patched the bridgedriver to be able to have the 3
>>> operations available from user-space (clean, inv, and flush), so we
>>> would be very interested in having the direction of the transfer
>>> available.
>>
>> Thanks, that's important to know.
>
> It's not that critical, but I guess it can't hurt to do the right thing.
>
>> What do you say about the following plan then:
>> - I'll add a single pair of begin_dma and end_dma commands that will
>> take the additional
>> direction parameter as described above. I'll then covert the existing
>> flush & invalidate commands to use this begin_dma command supplying it
>> the appropriate direction argument.
>
> Sounds perfect, but I wonder if the deprecated flush & invalidate
> would really work. See previous comments.
>
>> - In a separate patch, I'll deprecate flush & invalidate entirely
>> (usage of this deprecated
>> API will fail, resulting in a guiding error message).
>>
>> We get a sane and versatile (and platform-independent) implementation
>> that always work,
>> but breaks user space. If anyone would need to work with current user space,
>> the deprecating patch can always be reverted locally to get back a
>> flush & invalidate
>> implementations that (somehow) work.
>
> User-space API is being broken all the time in dspbridge. The
> difference is that this one might require nontrivial changes. But it
> must be done.
>
> So, I tried your patches, and a simple test app worked fine without
> modification, but a real video decoding hanged the device
> completely... some spinlock was stuck. I don't know if it's because of
> your patches, or because of the state of the bridge at that point.
> I'll try first to rebase to the latest to have a better idea of what's
> happening.

I read at your latest posts that after rebasing to newest dspbridge
code the issue doesn't reproduce anymore ? Please tell me if it's back.

>
> Also, I noticed an important problem. Your code assumes that we would
> always have a bunch of scattered pages, however you are not taking
> into account bridge_brd_mem_map() where vm_flags have VM_IO... in that
> case get_user_pages() is skipped completely. This code-path is
> important in order to mmap framebuffer memory, which is contiguous.
> So, in this case there are no pages too keep track at all.
>
> This use-case is very important since the dspbridge mmap operation is
> very expensive, and due to GStreamer design, we must do it
> constantly... if the memory is contiguous (VM_IO), the mmap operation
> is really fast (or at least should be... in theory).

Thanks for catching this VM_IO code path. I'll take care of this
and resubmit the patches.

>
> Reading your patches I see the ioctl to start the dmap operations
> would simply error out, but from the looks of it, they would be
> failing already, which is weird, because we are already using
> framebuffer memory for video decoding + rendering. In gst-dsp we are
> not checking the success of clean/invalidate ioctls so it might be
> that it has been failing all this time and seems to work by pure luck.
>
> Anyway, I'll keep investigating and let you know if I find anything.


Again thanks for your review,
your comments are very helpful.

Ohad.


>
> Cheers.
>
> --
> Felipe Contreras
>
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Arm (vger)]     [ARM Kernel]     [ARM MSM]     [Linux Tegra]     [Linux WPAN Networking]     [Linux Wireless Networking]     [Maemo Users]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux