On Tue, Jan 29, 2019 at 03:20:16PM +0200, Jarkko Sakkinen wrote: > On Thu, Jan 24, 2019 at 07:43:30AM +1300, Linus Torvalds wrote: > > On Thu, Jan 24, 2019 at 4:36 AM Jarkko Sakkinen > > <jarkko.sakkinen@xxxxxxxxxxxxxxx> wrote: > > > > > > > > Is it just that this particular hardware always happened to trigger > > > > the ERMS case (ie "rep movsb")? > > > > > > This is the particular snippet in question: > > > > > > memcpy_fromio(buf, priv->rsp, 6); > > > expected = be32_to_cpup((__be32 *) &buf[2]); > > > if (expected > count || expected < 6) > > > return -EIO; > > > > Ok, strange. > > > > So what *used* to happen is that the memcpy_fromio() would just expand > > as a "memcpy()", and in this case, gcc would then inline the memcpy(). > > In fact, gcc does it as a 4-byte access and a two-byte access from > > what I can tell. > > I verified, and it is exactly as you stated: > > 0xffffffff814aaa33 <+51>: mov (%rax),%edx > 0xffffffff814aaa35 <+53>: mov %edx,0x0(%rbp) > 0xffffffff814aaa38 <+56>: movzwl 0x4(%rax),%eax > 0xffffffff814aaa3c <+60>: mov %ax,0x4(%rbp) > > And your new version does exactly the same thing to the first six bytes > (with different opcode, but the same memory access pattern). I think I have found the root cause: memcpy_fromio(&__rsp_pa, &priv->regs_t->ctrl_rsp_pa, 8); This is from crb_map_io(). This should be read as quad word. I'll change it to ioread64() and see what happens. I don't know why it even has used memcpy_fromio() in the first place. I guess, when I first implemented the driver, I used that for no logical reason, and it has worked since up until now. /Jarkko