On Fri, Feb 23, 2018 at 7:09 PM, David Laight <David.Laight@xxxxxxxxxx> wrote: > From: Andy Shevchenko >> Sent: 23 February 2018 16:51 >> On Fri, Feb 23, 2018 at 6:41 PM, David Laight <David.Laight@xxxxxxxxxx> wrote: >> The side-effect I referred previously is about tails, i.e. unaligned >> bytes are transferred in portions >> like >> 7 on 64-bit will be 4 + 2 + 1, >> 5 = 4 + 1 > > on 64bit memcpy() is allowed to do: > (long *)(tgt+len)[-1] = (long *)(src+len)[-1]; > rep_movsq(tgt, src, len >> 3); > provided the length is at least 8. > > The misaligned PCIe transfer generates a single TLP covering 12 bytes with the > relevant byte enables set for the first and last 32bit words. But is it guaranteed on any type of bus? memcpy_toio() is a generic helper, so, first of all we need to be sure what CPU on its side does, this is I think is pretty straight forward since it's all written in asm for 64-bit case. So, what about buses? -- With Best Regards, Andy Shevchenko