Re: edma: "3-byte" transfers and masked writes in general

Matthijs van Duin <matthijsvanduin@xxxxxxxxx> · Fri, 7 Oct 2016 21:20:22 +0200

On 3 October 2016 at 18:29, Peter Ujfalusi <peter.ujfalusi@xxxxxx> wrote:
> Or not, with most peripherals we use constant addressing on the IP side and in
> the IP usually a register which tells the IP about the data type. This is the
> case for McASP at least. In case of 24 bit data we might have 1 byte of
> 'garbage' arriving to McASP it is going to ignore it.

Yeah the only thing that could have caused real problems is if McASP
would have decided to throw a bus error like hwspinlock does, but
evidently it doesn't, fortunately. Any garbage in byte 3 should get
masked off by the formatting unit anyway, and even if not, it's at
-144 dBFS so who cares. :)

> For us what matters is that the eDMA itself can read and write any alignment
> to/from memory and this is what we advertise via the DMAengine to clients.

Well, what triggered me a bit was that the thing being advertised was
actually a 3-byte "bus width", but especially after the further
exploration I've done I appreciate that trying to be exact about what
the IP and edma support seems pretty hopeless. Especially since the
path between them involves at least three different bus protocols, all
with different capabilities and semantics.

I'm not sure what the intended semantics of "bus width" in the
dmaengine api is anyway.

>> Not supporting 16-bit writes even though most fields of the
>> dma descriptors are 16-bit. Nicely done.
>
> I'm sure this is not that unique :(

I haven't encountered anything quite this bad actually. To add insult
to injury, it's just that local ram that needs special treatment in
the ethernet subsystem while its memory-mapped registers seem fine
with arbitrary accesses including masked writes.

A good runner-up in obnoxiousness is the L3 service network which
requires aligned 32-bit accesses, period. Even bursts (e.g. LDM or
LDRD) result in bus errors.

> BTW: what happens if you do the copy with CPU in unaligned manner to the
> ethernet dma descriptor memory? Is it going through fine or the same type of
> corruption happens?

If it is mapped as device memory then 32-bit-aligned STRD and STM work
correctly without resulting in corruption. Anything misaligned worse
than this results in an alignment fault and no bus activity.

If it is mapped as normal uncacheable memory then you seem to get
basically the same results as with edma, except the clobbered bytes
aren't repetitions of the same data but seem to be just random garbage
that was lying around in the cpu. Also in this case write combining
will really make your life miserable, since e.g. two separate 32-bit
stores to offsets 4 and 8 may get combined into a single misaligned
store unless you separate them with a memory barrier.

> In any case I don't see this as eDMA related issue, it is more SoC internal
> behaviour/integration issue and if some driver for an IP faces similar issue
> it is the IP driver's responsibility to use the DMA in a compatible way.

Well, yes and no. The problem is that they might specify a 32-bit bus
width and expect 32-bit alignment to be a sufficient criterion for
transfers to work, and indeed it would be if performed by the cpu
(when mapped as device memory) or a dma controller with 32-bit data
port (like dma4).

With edma however the transfers would additionally need either
- size power of two, aligned to size, or
- size and alignment some multiple of eDMA's port width;
or the peripheral needs to support masked writes, which is something a
driver probably doesn't know since it's not documented.

This is in fact it is a consequence of eDMA's incautious choice to
implement transfers with "unnatural" size/alignment (from eDMA's point
of view) by using masking rather than splitting the transfers, unlike
the cpu which uses masking only for normal uncacheable memory and
splitting for device memory.

Note I'm not calling for any change in particular, I just wanted to
make raise awareness of this unexpected result. It still doesn't seem
hugely likely any driver will be bitten by this, but hopefully if
anyone is then this thread will be dug up.

> OK. In Linux we do not touch the TC, all setup is via CC

Of course. My code just submitted it directly to a TC because it was
convenient for testing and requires no setup whatsoever, you just
write the transfer request to the TC. It however requires you have
exclusive use of the TC and you need to poll for completion so it's
really not an efficient use of resources.

Matthijs
--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html