> On May 17, 2022, at 12:15 PM, Edgecombe, Rick P <rick.p.edgecombe@xxxxxxxxx> wrote: > > On Sun, 2022-05-15 at 22:40 -0700, Song Liu wrote: >> Use module_alloc_huge for bpf_prog_pack so that BPF programs sit on >> PMD_SIZE pages. This benefits system performance by reducing iTLB >> miss >> rate. Benchmark of a real web service workload shows this change >> gives >> another ~0.2% performance boost on top of PAGE_SIZE bpf_prog_pack >> (which improve system throughput by ~0.5%). > > 0.7% sounds good as a whole. How sure are you of that +0.2%? Was this a > big averaged test? Yes, this was a test between two tiers with 10+ servers on each tier. We took the average performance over a few hours of shadow workload. > >> >> Also, remove set_vm_flush_reset_perms() from alloc_new_pack() and use >> set_memory_[nx|rw] in bpf_prog_pack_free(). This is because >> VM_FLUSH_RESET_PERMS does not work with huge pages yet. [1] >> >> [1] >> https://lore.kernel.org/bpf/aeeeaf0b7ec63fdba55d4834d2f524d8bf05b71b.camel@xxxxxxxxx/ >> Suggested-by: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx> > > As I said before, I think this will work functionally. But I meant it > as a quick fix when we were talking about patching this up to keep it > enabled upstream. > > So now, should we make VM_FLUSH_RESET_PERMS work properly with huge > pages? The main benefit would be to keep the tear down of these types > of allocations consistent for correctness reasons. The TLB flush > minimizing differences are probably less impactful given the caching > introduced here. At the very least though, we should have (or have > already had) some WARN if people try to use it with huge pages. I am not quite sure the exact work needed here. Rick, would you have time to enable VM_FLUSH_RESET_PERMS for huge pages? Given the merge window is coming soon, I guess we need current work around in 5.19. > >> Signed-off-by: Song Liu <song@xxxxxxxxxx> >> --- >> kernel/bpf/core.c | 12 +++++++----- >> 1 file changed, 7 insertions(+), 5 deletions(-) >> >> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c >> index cacd8684c3c4..b64d91fcb0ba 100644 >> --- a/kernel/bpf/core.c >> +++ b/kernel/bpf/core.c >> @@ -857,7 +857,7 @@ static size_t select_bpf_prog_pack_size(void) >> void *ptr; >> >> size = BPF_HPAGE_SIZE * num_online_nodes(); >> - ptr = module_alloc(size); >> + ptr = module_alloc_huge(size); > > This select_bpf_prog_pack_size() function always seemed weird - doing a > big allocation and then immediately freeing. Can't it check a config > for vmalloc huge page support? Yes, it is weird. Checking a config is not enough here. We also need to check vmap_allow_huge, which is controlled by boot parameter nohugeiomap. I haven’t got a better solution for this. Thanks, Song > >> >> /* Test whether we can get huge pages. If not just use >> PAGE_SIZE >> * packs. >> @@ -881,7 +881,7 @@ static struct bpf_prog_pack >> *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_ins >> GFP_KERNEL); >> if (!pack) >> return NULL; >> - pack->ptr = module_alloc(bpf_prog_pack_size); >> + pack->ptr = module_alloc_huge(bpf_prog_pack_size); >> if (!pack->ptr) { >> kfree(pack); >> return NULL; >> @@ -890,7 +890,6 @@ static struct bpf_prog_pack >> *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_ins >> bitmap_zero(pack->bitmap, bpf_prog_pack_size / >> BPF_PROG_CHUNK_SIZE); >> list_add_tail(&pack->list, &pack_list); >> >> - set_vm_flush_reset_perms(pack->ptr); >> set_memory_ro((unsigned long)pack->ptr, bpf_prog_pack_size / >> PAGE_SIZE); >> set_memory_x((unsigned long)pack->ptr, bpf_prog_pack_size / >> PAGE_SIZE); >> return pack; >> @@ -909,10 +908,9 @@ static void *bpf_prog_pack_alloc(u32 size, >> bpf_jit_fill_hole_t bpf_fill_ill_insn >> >> if (size > bpf_prog_pack_size) { >> size = round_up(size, PAGE_SIZE); >> - ptr = module_alloc(size); >> + ptr = module_alloc_huge(size); >> if (ptr) { >> bpf_fill_ill_insns(ptr, size); >> - set_vm_flush_reset_perms(ptr); >> set_memory_ro((unsigned long)ptr, size / >> PAGE_SIZE); >> set_memory_x((unsigned long)ptr, size / >> PAGE_SIZE); >> } >> @@ -949,6 +947,8 @@ static void bpf_prog_pack_free(struct >> bpf_binary_header *hdr) >> >> mutex_lock(&pack_mutex); >> if (hdr->size > bpf_prog_pack_size) { >> + set_memory_nx((unsigned long)hdr, hdr->size / >> PAGE_SIZE); >> + set_memory_rw((unsigned long)hdr, hdr->size / >> PAGE_SIZE); >> module_memfree(hdr); >> goto out; >> } >> @@ -975,6 +975,8 @@ static void bpf_prog_pack_free(struct >> bpf_binary_header *hdr) >> if (bitmap_find_next_zero_area(pack->bitmap, >> bpf_prog_chunk_count(), 0, >> bpf_prog_chunk_count(), 0) == 0) >> { >> list_del(&pack->list); >> + set_memory_nx((unsigned long)pack->ptr, >> bpf_prog_pack_size / PAGE_SIZE); >> + set_memory_rw((unsigned long)pack->ptr, >> bpf_prog_pack_size / PAGE_SIZE); >> module_memfree(pack->ptr); >> kfree(pack); >> }