On Tue, Feb 27, 2018 at 7:05 PM, James Smart <james.smart@xxxxxxxxxxxx> wrote: > On 2/27/2018 12:58 AM, Arnd Bergmann wrote: > > So you point out a very real concern, as in most cases the source buffer is > a bytestream and the desire is to send the bytestream in the same byte order > as in memory. It turns out we're somewhat lucky as the driver's source > buffer wasn't a real bytestream, it was a cpu-endian relative structure so > writex behavior would be ok. However, the driver is confused as it wants to > use 64-bit copies, but the cpu-endian structure was based on 32-bit's at a > time when mapping to the hardware. So things are ok if using writel, but not > so ok if writeq. Ah, so it is a structure of only 32-bit integers? Obviously if you have 16-bit or 64-bit integers mixed in with the structure, things get even more complicated. For an array of 32-bit little-endian registers, this variant should work on all architectures: #include <linux/io-64-nonatomic-lo-hi.h> for (i = 0; i < q->entry_size; i += sizeof(uint64_t)) writeq_relaxed(*((uint32_t *)(tmp + i) | *((uint32_t *)(tmp + i + 1) << 32, (q->dpp_regaddr + i) On x86_64 and x86_32, that should do the exact same thing as your version, but it should also work efficiently on other architectures. The __raw_writeq() version I suggested earlier is not correct if we are dealing with an array of __le32 values in hardware. On powerpc, the writeq_relaxed() unfortunately still prevents write-combining, so that won't be efficient. > As the code stands write now - this feature and the routine is only used > when x86, thus LE, thus the whole LSB conversion by writeX works. It's a > mute point. > > Please accept the patch as proposed. We will address these other issues as > we include the "push" support for other architectures. Could you add an #ifdef and comment around the 'if (q->dpp_enable ...)' block then to make sure that if anybody tries to make it work on other architectures, they are aware of the problem? Arnd