On Tue, Jan 14, 2025 at 02:38:42PM +0000, David Laight wrote: > From: Jiri Olsa > > Sent: 14 January 2025 14:03 > > > > hi, > > while checking on similar code for uprobes I was wondering if we > > can merge first 2 steps of instruction update in text_poke_bp_batch > > function. > > > > Basically the first step now would be to write int3 byte together > > with the rest of the bytes of the new instruction instead of doing > > that separately. And the second step would be to overwrite int3 > > byte with first byte of the new instruction. > > > > Would that work or do I miss some x86 detail that could lead to crash? > > I suspect it will 'crash and burn'. > > Consider what happens if there is a cache-line boundary in the > middle of an instruction. > (Actually an instruction fetch boundary will do.) > > cpu0: reads the old instructions from the old cache line. > cpu0: pipeline busy (or similar) so doesn't read the next cache line. > cpu1: writes the new instructions. > cpu0: reads the second cache line. > > cpu0 now has a mix of the old and new instruction bytes. > > Writing the int3 is safe - provided they don't return until > all the patching is over. > > But between writing the int3 (over the first opcode byte) and > updating anything else I suspect you need something that does > a complete synchronise between the cpu that discards any bytes > in the decode pipeline as well as flushing the I-cache (etc). > I suspect that requires an acked IPI. > > Very long cpu stalls are easy to generate. > Any read from PCIe will be slow (I've at fpga target that takes ~1us). > You'd need to be unlucky to be patching an instruction while one > was pending, but a DMA access might just be enough to cause grief. ok, thanks for all the details, jirka