On Tue, Nov 14, 2017 at 9:10 AM, Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote: >> (* OPTION 1 *) >> Store modified code (as data) into code segment; >> Jump to new code or an intermediate location; >> Execute new code;" > > Good point, so this is likely why I was having trouble reproducing the > single-threaded self-modifying code incoherent case. I did have a branch > in there. Actually, even *without* the branch, Intel has been very good at having precise I$ coherency. I think uou can literally store to the next instruction, and Intel CPU's after the Pentium Pro would notice, take a micro-fault, and handle it correctly (the i486 and Pentium did not have that level of coherency, but a taken branch would flush the fetch buffer). An in-order Atom probabably has the old Pentium behavior, and you could see it there. But starting with the P6, and OoO execution, the "taken branch" thing meant very little, so Intel started instead just doing the "store-vs-instruction fetch" coherency explicitly, which causes it to be precise. Afaik, the only way to show incoherent I$ fairly easily is to use virtual aliasing, and store to a different virtual address, because the fetch buffer coherency is done by virtual address. But even then, it's only the fetch buffer (and it's been called different things over the years, now it's a uop loop cache), not the L1 caches, so you get a very limited window of instructions. And that fetch buffer is also where any cross-cpu incoherency would be, for the exact same reason. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html