> On Jan 20, 2022, at 8:59 PM, Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Thu, Jan 20, 2022 at 11:13 AM Song Liu <song@xxxxxxxxxx> wrote: >> >> From: Song Liu <songliubraving@xxxxxx> >> >> Use bpf_prog_pack allocator in x86_64 jit. >> >> The program header from bpf_prog_pack is read only during the jit process. >> Therefore, the binary is first written to a temporary buffer, and later >> copied to final location with text_poke_copy(). >> >> Similarly, jit_fill_hole() is updated to fill the hole with 0xcc using >> text_poke_copy(). >> >> Signed-off-by: Song Liu <songliubraving@xxxxxx> >> --- >> arch/x86/net/bpf_jit_comp.c | 134 +++++++++++++++++++++++++++--------- >> 1 file changed, 103 insertions(+), 31 deletions(-) >> >> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c >> index fe4f08e25a1d..6d97f7c24df2 100644 >> --- a/arch/x86/net/bpf_jit_comp.c >> +++ b/arch/x86/net/bpf_jit_comp.c >> @@ -216,11 +216,34 @@ static u8 simple_alu_opcodes[] = { >> [BPF_ARSH] = 0xF8, >> }; >> >> +static char jit_hole_buffer[PAGE_SIZE] = {}; > > Let's not waste a page filled with 0xcc. > The pack allocator will reserve 128 bytes at the front > and will round up the tail to 64 bytes. > So this can be a 128 byte array? > >> + >> static void jit_fill_hole(void *area, unsigned int size) >> +{ >> + struct bpf_binary_header *hdr = area; >> + int i; >> + >> + for (i = 0; i < roundup(size, PAGE_SIZE); i += PAGE_SIZE) { > > multi page 0xcc-ing? > Is it because bpf_jit_binary_alloc_pack() allocates 2MB > and then populates the whole thing with this? > Seems overkill. > 0xcc in the front of the prog and in the back is there > to catch JIT bugs. > No need to fill 2MB with it. I got this logic because current code memset(0xcc) for the whole buffer. We can change the logic to only 0xcc the first 128 bytes and the last 64 bytes. > > >> + int s; >> + >> + s = min_t(int, PAGE_SIZE, size - i); >> + text_poke_copy(area + i, jit_hole_buffer, s); >> + } >> + >> + /* >> + * bpf_jit_binary_alloc_pack cannot write size directly to the ro >> + * mapping. Write it here with text_poke_copy(). >> + */ >> + text_poke_copy(&hdr->size, &size, sizeof(size)); >> +} [...] >> @@ -2248,8 +2261,10 @@ int arch_prepare_bpf_dispatcher(void *image, s64 *funcs, int num_funcs) >> >> struct x64_jit_data { >> struct bpf_binary_header *header; >> + struct bpf_binary_header *tmp_header; >> int *addrs; >> u8 *image; >> + u8 *tmp_image; > > Why add these two fields here? > With new logic header and image will be zero always? header and image point to a section of the 2MB page; while tmp_header and tmp_image point to a temporary buffer from kzalloc. We need them in x86_jit_data, so that we can reuse the temporary buffer between multiple calls of bpf_int_jit_compile(). It is used as: bpf_int_jit_compile(...) { /* ... */ jit_data = prog->aux->jit_data; if (!jit_data) { /* kzalloc jit_data */ } addrs = jit_data->addrs; if (addrs) { /* reuse previous jit_data */ } > Maybe rename them instead? > Or both used during JIT-ing? > >> int proglen; >> struct jit_context ctx; >> }; >> @@ -2259,6 +2274,7 @@ struct x64_jit_data { >> [...] >> } >> - prog->aux->extable = (void *) image + roundup(proglen, align); >> + if (header->size > bpf_prog_pack_max_size()) { >> + tmp_header = header; >> + tmp_image = image; >> + } else { >> + tmp_header = kzalloc(header->size, GFP_KERNEL); > > the header->size could be just below 2MB. > I don't think kzalloc() can handle that. Technically, kzalloc can handle 2MB allocation via: kzalloc() => kmalloc() => kmalloc_large() => kmalloc_order() But this would fail when the memory is fragmented. I guess we should use kvmalloc() instead? > >> + if (!tmp_header) { >> + bpf_jit_binary_free_pack(header); >> + header = NULL; >> + prog = orig_prog; >> + goto out_addrs; >> + } >> + tmp_header->size = header->size; >> + tmp_image = (void *)tmp_header + ((void *)image - (void *)header); > > Why is 'tmp_image' needed at all? > The above math can be done where necessary. We pass both image and tmp_image to do_jit(), as it needs both of them. I think maintaining a tmp_image variable makes the logic cleaner. We can remove it from x64_jit_data, I guess. > >> + } >> + prog->aux->extable = (void *)image + roundup(proglen, align); > > I suspect if you didn't remove the space between (void *) and image > the diff would be less confusing. > This line didn't change, right? Yeah, I forgot why I changed it in the first place. Let me undo it. Thanks, Song [...]