On Thu, Jan 24, 2019 at 4:36 AM Jarkko Sakkinen <jarkko.sakkinen@xxxxxxxxxxxxxxx> wrote: > > > > Is it just that this particular hardware always happened to trigger > > the ERMS case (ie "rep movsb")? > > This is the particular snippet in question: > > memcpy_fromio(buf, priv->rsp, 6); > expected = be32_to_cpup((__be32 *) &buf[2]); > if (expected > count || expected < 6) > return -EIO; Ok, strange. So what *used* to happen is that the memcpy_fromio() would just expand as a "memcpy()", and in this case, gcc would then inline the memcpy(). In fact, gcc does it as a 4-byte access and a two-byte access from what I can tell. Which is actually exactly the same as memcpy_fromio() should do, just using a different code sequence. > memcpy_fromio(&buf[6], &priv->rsp[6], expected - 6); This one gets turned into an out-of-line "memcpy()" in the old world order, which depending on size will do different things, but might be a "rep movsb". Or it might be the software expansion that does overlapping accesses and/or backwards copies. In the new world order, it's the "memcpy_fromio()" that willdo first 4-byte accesses for the main bulk of the copy, and then end up with a two-byte and single-byte move to pad out the end. > I guess it did in the first memcpy_fromio operation since it is less > than a quad word, right? Not sure why the 2nd memcpy_fromio() operation > has worked, though. The first one seems to do the same thing now as it used to do, so I don't *think* it should have mattered. The second one looks like it is unaligned (offset 6) and doing the 4-byte io reads would fail if that device needs aligned accesses. The old memcpy() *might* have done it with a "rep movsb" that would just work (?). Linus