On Thu, Nov 2, 2023 at 9:19 AM Mark Rutland <mark.rutland@xxxxxxx> wrote: > > Hi Puranjay, > > On Fri, Sep 08, 2023 at 02:43:18PM +0000, Puranjay Mohan wrote: > > This will be used by BPF JIT compiler to dump JITed binary to a RX huge > > page, and thus allow multiple BPF programs sharing the a huge (2MB) > > page. > > > > The bpf_prog_pack allocator that implements the above feature allocates > > a RX/RW buffer pair. The JITed code is written to the RW buffer and then > > this function will be used to copy the code from RW to RX buffer. > > > > Signed-off-by: Puranjay Mohan <puranjay12@xxxxxxxxx> > > Acked-by: Song Liu <song@xxxxxxxxxx> > > --- > > arch/arm64/include/asm/patching.h | 1 + > > arch/arm64/kernel/patching.c | 41 +++++++++++++++++++++++++++++++ > > 2 files changed, 42 insertions(+) > > > > diff --git a/arch/arm64/include/asm/patching.h b/arch/arm64/include/asm/patching.h > > index 68908b82b168..f78a0409cbdb 100644 > > --- a/arch/arm64/include/asm/patching.h > > +++ b/arch/arm64/include/asm/patching.h > > @@ -8,6 +8,7 @@ int aarch64_insn_read(void *addr, u32 *insnp); > > int aarch64_insn_write(void *addr, u32 insn); > > > > int aarch64_insn_write_literal_u64(void *addr, u64 val); > > +void *aarch64_insn_copy(void *dst, const void *src, size_t len); > > > > int aarch64_insn_patch_text_nosync(void *addr, u32 insn); > > int aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt); > > diff --git a/arch/arm64/kernel/patching.c b/arch/arm64/kernel/patching.c > > index b4835f6d594b..243d6ae8d2d8 100644 > > --- a/arch/arm64/kernel/patching.c > > +++ b/arch/arm64/kernel/patching.c > > @@ -105,6 +105,47 @@ noinstr int aarch64_insn_write_literal_u64(void *addr, u64 val) > > return ret; > > } > > > > +/** > > + * aarch64_insn_copy - Copy instructions into (an unused part of) RX memory > > + * @dst: address to modify > > + * @src: source of the copy > > + * @len: length to copy > > + * > > + * Useful for JITs to dump new code blocks into unused regions of RX memory. > > + */ > > +noinstr void *aarch64_insn_copy(void *dst, const void *src, size_t len) > > +{ > > + unsigned long flags; > > + size_t patched = 0; > > + size_t size; > > + void *waddr; > > + void *ptr; > > + int ret; > > + > > + raw_spin_lock_irqsave(&patch_lock, flags); > > + > > + while (patched < len) { > > + ptr = dst + patched; > > + size = min_t(size_t, PAGE_SIZE - offset_in_page(ptr), > > + len - patched); > > + > > + waddr = patch_map(ptr, FIX_TEXT_POKE0); > > + ret = copy_to_kernel_nofault(waddr, src + patched, size); > > + patch_unmap(FIX_TEXT_POKE0); > > + > > + if (ret < 0) { > > + raw_spin_unlock_irqrestore(&patch_lock, flags); > > + return NULL; > > + } > > + patched += size; > > + } > > + raw_spin_unlock_irqrestore(&patch_lock, flags); > > + > > + caches_clean_inval_pou((uintptr_t)dst, (uintptr_t)dst + len); > > As Xu mentioned, either this needs to use flush_icache_range() to IPI all CPUs > in the system, or we need to make it the caller's responsibility to do that. > > Otherwise, I think this is functionally ok, but I'm not certain that it's good > for BPF to be using the FIX_TEXT_POKE0 slot as that will serialize all BPF > loading, ftrace, kprobes, etc against one another. Do we ever expect to load > multiple BPF programs in parallel, or is that serialized at a higher level? bpf loading is pretty much serialized by the verifier. It's a very slow operation.