Hi Mark, On Thu, Jun 22, 2023 at 10:23 AM Mark Rutland <mark.rutland@xxxxxxx> wrote: > > On Wed, Jun 21, 2023 at 10:57:20PM +0200, Puranjay Mohan wrote: > > On Wed, Jun 21, 2023 at 5:31 PM Mark Rutland <mark.rutland@xxxxxxx> wrote: > > > On Mon, Jun 19, 2023 at 10:01:21AM +0000, Puranjay Mohan wrote: > > > > @@ -1562,34 +1610,39 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) > > > > > > > > /* 3. Extra pass to validate JITed code. */ > > > > if (validate_ctx(&ctx)) { > > > > - bpf_jit_binary_free(header); > > > > prog = orig_prog; > > > > - goto out_off; > > > > + goto out_free_hdr; > > > > } > > > > > > > > /* And we're done. */ > > > > if (bpf_jit_enable > 1) > > > > bpf_jit_dump(prog->len, prog_size, 2, ctx.image); > > > > > > > > - bpf_flush_icache(header, ctx.image + ctx.idx); > > > > + bpf_flush_icache(ro_header, ctx.ro_image + ctx.idx); > > > > > > I think this is too early; we haven't copied the instructions into the > > > ro_header yet, so that still contains stale instructions. > > > > > > IIUC at the whole point of this is to pack multiple programs into shared ROX > > > pages, and so there can be an executable mapping of the RO page at this point, > > > and the CPU can fetch stale instructions throught that. > > > > > > Note that *regardless* of whether there is an executeable mapping at this point > > > (and even if no executable mapping exists until after the copy), we at least > > > need a data cache clean to the PoU *after* the copy (so fetches don't get a > > > stale value from the PoU), and the I-cache maintenance has to happeon the VA > > > the instrutions will be executed from (or VIPT I-caches can still contain stale > > > instructions). > > > > Thanks for catching this, It is a big miss from my side. > > > > I was able to reproduce the boot issue in the other thread on my > > raspberry pi. I think it is connected to the > > wrong I-cache handling done by me. > > > > As you rightly pointed out: We need to do bpf_flush_icache() after > > copying the instructions to the ro_header or the CPU can run > > incorrect instructions. > > > > When I move the call to bpf_flush_icache() after > > bpf_jit_binary_pack_finalize() (this does the copy to ro_header), the > > boot issue > > is fixed. Would this change be enough to make this work or I would > > need to do more with the data cache as well to catch other > > edge cases? > > AFAICT, bpf_flush_icache() calls flush_icache_range(). Despite its name, > flush_icache_range() has d-cache maintenance, i-cache maintenance, and context > synchronization (i.e. it does everything necessary). > > As long as you call that with the VAs the code will be executed from, that > should be sufficient, and you don't need to do any other work. Thanks for explaining this. After reading your explanation, I feel this should work. bpf_jit_binary_pack_finalize() will copy the instructions from rw_header to ro_header. After the copy, calling bpf_flush_icache(ro_header, ctx.ro_image + ctx.idx); will invalidate the caches for the VAs in the ro_header, this is where the code will be executed from. I will send the v4 patchset with this change. Thanks, Puranjay