> On Jun 19, 2023, at 10:09 AM, Andy Lutomirski <luto@xxxxxxxxxx> wrote: > > But jit_text_alloc() can't do this, because the order of operations doesn't match. With jit_text_alloc(), the executable mapping shows up before the text is populated, so there is no atomic change from not-there to populated-and-executable. Which means that there is an opportunity for CPUs, speculatively or otherwise, to start filling various caches with intermediate states of the text, which means that various architectures (even x86!) may need serialization. > > For eBPF- and module- like use cases, where JITting/code gen is quite coarse-grained, perhaps something vaguely like: > > jit_text_alloc() -> returns a handle and an executable virtual address, but does *not* map it there > jit_text_write() -> write to that handle > jit_text_map() -> map it and synchronize if needed (no sync needed on x86, I think) Andy, would you mind explaining why you think a sync is not needed? I mean I have a “feeling” that perhaps TSO can guarantee something based on the order of write and page-table update. Is that the argument? On this regard, one thing that I clearly do not understand is why *today* it is ok for users of bpf_arch_text_copy() not to call text_poke_sync(). Am I missing something?