Nevermind I think I figured out the problem. It's the cache initializing stores, we can't do overlapping copies where dst <= src in all cases because of them. A store to a address modulo the cache line size (which for these instructions is 64 bytes), clears that whole line. But when we're doing these memmove() calls in SLAB/SLUB, we can clear some bytes at the end of the line before they've been read in. And reading over NG4memcpy, this _can_ happen, the main unrolled loop begins like this: load src + 0x00 load src + 0x08 load src + 0x10 load src + 0x18 load src + 0x20 store dst + 0x00 Assume dst is 64 byte aligned and let's say that dst is src - 8 for this memcpy() call, right? That store at the end there is the one to the first line in the cache line, thus clearing the whole line, which thus clobbers "src + 0x28" before it even gets loaded. I'm pretty sure this is what's happening. And it's only going to trigger if the memcpy() is 128 bytes or larger. I'll work on a fix. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>