On Wed, Jun 19, 2019 at 11:55:41PM +0000, Vineet Gupta wrote: > So we ensure a patched instruction never crosses a > cache line - using .balign 4. This causes a slight mis-optimization that all > patched instruction locations are forced to be 4 bytes aligned while ISA allows > code to be 2 byte aligned. The cost is an extra NOP_S (2 bytes) - no big deal in > grand scheme of things in IMO. Right, so the scheme x86 uses (which I outlined in an earlier email) allows you to get rid of those extra NOPs. Given jump labels are typically used on fast paths, and NOPs still take up cycles to, at the very least, fetch and decode, some people might care. But if you're OK with having them, then sure, your scheme certainly should work.