On Thu, Jan 24, 2019 at 07:43:30AM +1300, Linus Torvalds wrote: > On Thu, Jan 24, 2019 at 4:36 AM Jarkko Sakkinen > <jarkko.sakkinen@xxxxxxxxxxxxxxx> wrote: > > > > > > Is it just that this particular hardware always happened to trigger > > > the ERMS case (ie "rep movsb")? > > > > This is the particular snippet in question: > > > > memcpy_fromio(buf, priv->rsp, 6); > > expected = be32_to_cpup((__be32 *) &buf[2]); > > if (expected > count || expected < 6) > > return -EIO; > > Ok, strange. > > So what *used* to happen is that the memcpy_fromio() would just expand > as a "memcpy()", and in this case, gcc would then inline the memcpy(). > In fact, gcc does it as a 4-byte access and a two-byte access from > what I can tell. I verified, and it is exactly as you stated: 0xffffffff814aaa33 <+51>: mov (%rax),%edx 0xffffffff814aaa35 <+53>: mov %edx,0x0(%rbp) 0xffffffff814aaa38 <+56>: movzwl 0x4(%rax),%eax 0xffffffff814aaa3c <+60>: mov %ax,0x4(%rbp) And your new version does exactly the same thing to the first six bytes (with different opcode, but the same memory access pattern). /Jarkko