On Wed, Nov 13, 2019 at 8:53 AM Linus Walleij <linus.walleij@xxxxxxxxxx> wrote: > + /* Pack the residue into a 32bit word */ > + for (i = 0; i < host->pio_write_residue_sz; i++) { > + val |= host->pio_write_residue[i]; > + val <<= 8; > + } > + /* Top up with new data */ > + for (i = 0; i < fill; i++) { > + val |= *ptr; > + val <<= 8; > + ptr++; > + remain--; > + } I'm worried that I might have gotten this wrong. iowrite32_rep() reads the data little-endian (native endianness) from memory does it not? Bytes [0 1 2 3] end up in the FIFO like [3 2 1 0]. So it will pack the first byte into the lowest 8 bits, second byte into bits 8-15 etc. So I should rewrite all the loops like this: for (i = 0; i < host->pio_write_residue_sz; i++) { val |= (host->pio_write_residue[i] << 24); val >>= 8; } So I shift the value down from the high bits instead of the other way around. This also gives a pretty plausible hint att what might be wrong with the DMA in non-divisible by 4. As suggested by Stephan in another context, I will try to set up my own test rig for this. Yours, Linus Walleij