On Thu, Jan 31, 2019 at 06:04:37PM +0200, Jarkko Sakkinen wrote: > On Thu, Jan 31, 2019 at 02:26:06PM +0200, Jarkko Sakkinen wrote: > > On Tue, Jan 29, 2019 at 03:20:16PM +0200, Jarkko Sakkinen wrote: > > > On Thu, Jan 24, 2019 at 07:43:30AM +1300, Linus Torvalds wrote: > > > > On Thu, Jan 24, 2019 at 4:36 AM Jarkko Sakkinen > > > > <jarkko.sakkinen@xxxxxxxxxxxxxxx> wrote: > > > > > > > > > > > > Is it just that this particular hardware always happened to trigger > > > > > > the ERMS case (ie "rep movsb")? > > > > > > > > > > This is the particular snippet in question: > > > > > > > > > > memcpy_fromio(buf, priv->rsp, 6); > > > > > expected = be32_to_cpup((__be32 *) &buf[2]); > > > > > if (expected > count || expected < 6) > > > > > return -EIO; > > > > > > > > Ok, strange. > > > > > > > > So what *used* to happen is that the memcpy_fromio() would just expand > > > > as a "memcpy()", and in this case, gcc would then inline the memcpy(). > > > > In fact, gcc does it as a 4-byte access and a two-byte access from > > > > what I can tell. > > > > > > I verified, and it is exactly as you stated: > > > > > > 0xffffffff814aaa33 <+51>: mov (%rax),%edx > > > 0xffffffff814aaa35 <+53>: mov %edx,0x0(%rbp) > > > 0xffffffff814aaa38 <+56>: movzwl 0x4(%rax),%eax > > > 0xffffffff814aaa3c <+60>: mov %ax,0x4(%rbp) > > > > > > And your new version does exactly the same thing to the first six bytes > > > (with different opcode, but the same memory access pattern). > > > > I think I have found the root cause: > > > > memcpy_fromio(&__rsp_pa, &priv->regs_t->ctrl_rsp_pa, 8); > > > > This is from crb_map_io(). This should be read as quad word. > > > > I'll change it to ioread64() and see what happens. I don't know why it > > even has used memcpy_fromio() in the first place. I guess, when I first > > implemented the driver, I used that for no logical reason, and it has > > worked since up until now. > > No, cannot be it. If you couldn't read it in two dwords, then it would > have been always broken with 32-bit build. > > Anyway, just in case, I will check what address it prints out. Found something that *does* fix the issue. If I replace memcpy_*io() calls with regular memcpy(), the driver works and all my tests pass. /Jarkko