On Fri, Jan 21, 2022 at 4:23 PM Song Liu <songliubraving@xxxxxx> wrote: > > > > > On Jan 21, 2022, at 3:55 PM, Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Fri, Jan 21, 2022 at 11:49 AM Song Liu <song@xxxxxxxxxx> wrote: > >> > >> +static struct bpf_binary_header * > >> +__bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr, > >> + unsigned int alignment, > >> + bpf_jit_fill_hole_t bpf_fill_ill_insns, > >> + u32 round_up_to) > >> +{ > >> + struct bpf_binary_header *hdr; > >> + u32 size, hole, start; > >> + > >> + WARN_ON_ONCE(!is_power_of_2(alignment) || > >> + alignment > BPF_IMAGE_ALIGNMENT); > >> + > >> + /* Most of BPF filters are really small, but if some of them > >> + * fill a page, allow at least 128 extra bytes to insert a > >> + * random section of illegal instructions. > >> + */ > >> + size = round_up(proglen + sizeof(*hdr) + 128, round_up_to); > >> + > >> + if (bpf_jit_charge_modmem(size)) > >> + return NULL; > >> + hdr = bpf_jit_alloc_exec(size); > >> + if (!hdr) { > >> + bpf_jit_uncharge_modmem(size); > >> + return NULL; > >> + } > >> + > >> + /* Fill space with illegal/arch-dep instructions. */ > >> + bpf_fill_ill_insns(hdr, size); > >> + > >> + hdr->size = size; > >> + hole = min_t(unsigned int, size - (proglen + sizeof(*hdr)), > >> + PAGE_SIZE - sizeof(*hdr)); > > > > It probably should be 'round_up_to' instead of PAGE_SIZE ? > > Actually, some of these change is not longer needed after the following > change in v6: > > 4. Change fall back round_up_to in bpf_jit_binary_alloc_pack() from > BPF_PROG_MAX_PACK_PROG_SIZE to PAGE_SIZE. > > My initial thought (last year) was if we allocate more than 2MB (either > 2.1MB or 3.9MB), we round up to 4MB to save page table entries. > However, when I revisited this earlier today, I thought we should still > round up to PAGE_SIZE to save memory > > Right now, I am not sure which way is better. What do you think? If we > round up to PAGE_SIZE, we don't need split out __bpf_jit_binary_alloc(). The less code duplication the better. > > > >> + start = (get_random_int() % hole) & ~(alignment - 1); > >> + > >> + /* Leave a random number of instructions before BPF code. */ > >> + *image_ptr = &hdr->image[start]; > >> + > >> + return hdr; > >> +} > >> + > >> struct bpf_binary_header * > >> bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr, > >> unsigned int alignment, > >> bpf_jit_fill_hole_t bpf_fill_ill_insns) > >> +{ > >> + return __bpf_jit_binary_alloc(proglen, image_ptr, alignment, > >> + bpf_fill_ill_insns, PAGE_SIZE); > >> +} > >> + > >> +struct bpf_binary_header * > >> +bpf_jit_binary_alloc_pack(unsigned int proglen, u8 **image_ptr, > >> + unsigned int alignment, > >> + bpf_jit_fill_hole_t bpf_fill_ill_insns) > >> { > >> struct bpf_binary_header *hdr; > >> u32 size, hole, start; > >> @@ -875,11 +1034,16 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr, > >> * fill a page, allow at least 128 extra bytes to insert a > >> * random section of illegal instructions. > >> */ > >> - size = round_up(proglen + sizeof(*hdr) + 128, PAGE_SIZE); > >> + size = round_up(proglen + sizeof(*hdr) + 128, BPF_PROG_CHUNK_SIZE); > >> + > >> + /* for too big program, use __bpf_jit_binary_alloc. */ > >> + if (size > BPF_PROG_MAX_PACK_PROG_SIZE) > >> + return __bpf_jit_binary_alloc(proglen, image_ptr, alignment, > >> + bpf_fill_ill_insns, PAGE_SIZE); > >> > >> if (bpf_jit_charge_modmem(size)) > >> return NULL; > >> - hdr = bpf_jit_alloc_exec(size); > >> + hdr = bpf_prog_pack_alloc(size); > >> if (!hdr) { > >> bpf_jit_uncharge_modmem(size); > >> return NULL; > >> @@ -888,9 +1052,8 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr, > >> /* Fill space with illegal/arch-dep instructions. */ > >> bpf_fill_ill_insns(hdr, size); > >> > >> - hdr->size = size; > > > > I'm missing where it's assigned. > > Looks like hdr->size stays zero, so free is never performed? > > This is read only memory, so we set it in bpf_fill_ill_insns(). There was a > comment in x86/bpf_jit_comp.c. I guess we also need a comment here. Ahh. Found it. Pls don't do it in fill_insn. It's the wrong layering. It feels that callbacks need to be redesigned. I would operate on rw_header here and use existing arch specific callback fill_insn to write into rw_image. Both insns during JITing and 0xcc on both sides of the prog. Populate rw_header->size (either before or after JITing) and then do single text_poke_copy to write the whole thing into the correct spot.