From: Andy Shevchenko > Sent: 23 February 2018 17:13 > To: David Laight > Cc: Arnd Bergmann; James Smart; Dick Kennedy; James E.J. Bottomley; Martin K. Petersen; Hannes > Reinecke; Johannes Thumshirn; linux-scsi@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx > Subject: Re: [PATCH] scsi: lpfc: use memcpy_toio instead of writeq > > On Fri, Feb 23, 2018 at 7:09 PM, David Laight <David.Laight@xxxxxxxxxx> wrote: > > From: Andy Shevchenko > >> Sent: 23 February 2018 16:51 > >> On Fri, Feb 23, 2018 at 6:41 PM, David Laight <David.Laight@xxxxxxxxxx> wrote: > > > >> The side-effect I referred previously is about tails, i.e. unaligned > >> bytes are transferred in portions > >> like > >> 7 on 64-bit will be 4 + 2 + 1, > >> 5 = 4 + 1 > > > > on 64bit memcpy() is allowed to do: > > (long *)(tgt+len)[-1] = (long *)(src+len)[-1]; > > rep_movsq(tgt, src, len >> 3); > > provided the length is at least 8. > > > > The misaligned PCIe transfer generates a single TLP covering 12 bytes with the > > relevant byte enables set for the first and last 32bit words. > > But is it guaranteed on any type of bus? > memcpy_toio() is a generic helper, so, first of all we need to be sure > what CPU on its side does, this is I think is pretty straight forward > since it's all written in asm for 64-bit case. I've just done a compile test, on x86-64 memcpy_toio() generates a call to memcpy() (checked with objdump -dr). That is on a system running a 4.14 kernel, so is probably using the system headers from that release. I'd need to do a run-time test on a newer system verify what the PCIe slave sees - but I changed our driver to do its own copy loops in order to avoid byte transfers some time ago. FWIW I was originally doing copy_to/from_user() directly to PCIe memory addresses! On x86 'memory' on devices can always be accesses by simple instructions. Hardware 'IO' addresses are not valid for memcpy_to/fromio(). David