On 3 October 2016 at 18:29, Peter Ujfalusi <peter.ujfalusi@xxxxxx> wrote: > Or not, with most peripherals we use constant addressing on the IP side and in > the IP usually a register which tells the IP about the data type. This is the > case for McASP at least. In case of 24 bit data we might have 1 byte of > 'garbage' arriving to McASP it is going to ignore it. Yeah the only thing that could have caused real problems is if McASP would have decided to throw a bus error like hwspinlock does, but evidently it doesn't, fortunately. Any garbage in byte 3 should get masked off by the formatting unit anyway, and even if not, it's at -144 dBFS so who cares. :) > For us what matters is that the eDMA itself can read and write any alignment > to/from memory and this is what we advertise via the DMAengine to clients. Well, what triggered me a bit was that the thing being advertised was actually a 3-byte "bus width", but especially after the further exploration I've done I appreciate that trying to be exact about what the IP and edma support seems pretty hopeless. Especially since the path between them involves at least three different bus protocols, all with different capabilities and semantics. I'm not sure what the intended semantics of "bus width" in the dmaengine api is anyway. >> Not supporting 16-bit writes even though most fields of the >> dma descriptors are 16-bit. Nicely done. > > I'm sure this is not that unique :( I haven't encountered anything quite this bad actually. To add insult to injury, it's just that local ram that needs special treatment in the ethernet subsystem while its memory-mapped registers seem fine with arbitrary accesses including masked writes. A good runner-up in obnoxiousness is the L3 service network which requires aligned 32-bit accesses, period. Even bursts (e.g. LDM or LDRD) result in bus errors. > BTW: what happens if you do the copy with CPU in unaligned manner to the > ethernet dma descriptor memory? Is it going through fine or the same type of > corruption happens? If it is mapped as device memory then 32-bit-aligned STRD and STM work correctly without resulting in corruption. Anything misaligned worse than this results in an alignment fault and no bus activity. If it is mapped as normal uncacheable memory then you seem to get basically the same results as with edma, except the clobbered bytes aren't repetitions of the same data but seem to be just random garbage that was lying around in the cpu. Also in this case write combining will really make your life miserable, since e.g. two separate 32-bit stores to offsets 4 and 8 may get combined into a single misaligned store unless you separate them with a memory barrier. > In any case I don't see this as eDMA related issue, it is more SoC internal > behaviour/integration issue and if some driver for an IP faces similar issue > it is the IP driver's responsibility to use the DMA in a compatible way. Well, yes and no. The problem is that they might specify a 32-bit bus width and expect 32-bit alignment to be a sufficient criterion for transfers to work, and indeed it would be if performed by the cpu (when mapped as device memory) or a dma controller with 32-bit data port (like dma4). With edma however the transfers would additionally need either - size power of two, aligned to size, or - size and alignment some multiple of eDMA's port width; or the peripheral needs to support masked writes, which is something a driver probably doesn't know since it's not documented. This is in fact it is a consequence of eDMA's incautious choice to implement transfers with "unnatural" size/alignment (from eDMA's point of view) by using masking rather than splitting the transfers, unlike the cpu which uses masking only for normal uncacheable memory and splitting for device memory. Note I'm not calling for any change in particular, I just wanted to make raise awareness of this unexpected result. It still doesn't seem hugely likely any driver will be bitten by this, but hopefully if anyone is then this thread will be dug up. > OK. In Linux we do not touch the TC, all setup is via CC Of course. My code just submitted it directly to a TC because it was convenient for testing and requires no setup whatsoever, you just write the transfer request to the TC. It however requires you have exclusive use of the TC and you need to poll for completion so it's really not an efficient use of resources. Matthijs -- To unsubscribe from this list: send the line "unsubscribe dmaengine" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html