> On May 17, 2022, at 4:58 PM, Edgecombe, Rick P <rick.p.edgecombe@xxxxxxxxx> wrote: > > On Tue, 2022-05-17 at 21:08 +0000, Song Liu wrote: >>> On May 17, 2022, at 12:15 PM, Edgecombe, Rick P < >>> rick.p.edgecombe@xxxxxxxxx> wrote: >>> >>> On Sun, 2022-05-15 at 22:40 -0700, Song Liu wrote: >>>> Use module_alloc_huge for bpf_prog_pack so that BPF programs sit >>>> on >>>> PMD_SIZE pages. This benefits system performance by reducing iTLB >>>> miss >>>> rate. Benchmark of a real web service workload shows this change >>>> gives >>>> another ~0.2% performance boost on top of PAGE_SIZE bpf_prog_pack >>>> (which improve system throughput by ~0.5%). >>> >>> 0.7% sounds good as a whole. How sure are you of that +0.2%? Was >>> this a >>> big averaged test? >> >> Yes, this was a test between two tiers with 10+ servers on each >> tier. >> We took the average performance over a few hours of shadow workload. > > Awesome. Sounds great. > >> >>> >>>> >>>> Also, remove set_vm_flush_reset_perms() from alloc_new_pack() and >>>> use >>>> set_memory_[nx|rw] in bpf_prog_pack_free(). This is because >>>> VM_FLUSH_RESET_PERMS does not work with huge pages yet. [1] >>>> >>>> [1] >>>> > https://lore.kernel.org/bpf/aeeeaf0b7ec63fdba55d4834d2f524d8bf05b71b.camel@xxxxxxxxx/ >>>> Suggested-by: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx> >>> >>> As I said before, I think this will work functionally. But I meant >>> it >>> as a quick fix when we were talking about patching this up to keep >>> it >>> enabled upstream. >>> >>> So now, should we make VM_FLUSH_RESET_PERMS work properly with huge >>> pages? The main benefit would be to keep the tear down of these >>> types >>> of allocations consistent for correctness reasons. The TLB flush >>> minimizing differences are probably less impactful given the >>> caching >>> introduced here. At the very least though, we should have (or have >>> already had) some WARN if people try to use it with huge pages. >> >> I am not quite sure the exact work needed here. Rick, would you have >> time to enable VM_FLUSH_RESET_PERMS for huge pages? Given the merge >> window is coming soon, I guess we need current work around in 5.19. > > I would have hard time squeezing that in now. The vmalloc part is easy, > I think I already posted a diff. But first hibernate needs to be > changed to not care about direct map page sizes. I guess I missed the diff, could you please send a link to it? > >> >>> >>>> Signed-off-by: Song Liu <song@xxxxxxxxxx> >>>> --- >>>> kernel/bpf/core.c | 12 +++++++----- >>>> 1 file changed, 7 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c >>>> index cacd8684c3c4..b64d91fcb0ba 100644 >>>> --- a/kernel/bpf/core.c >>>> +++ b/kernel/bpf/core.c >>>> @@ -857,7 +857,7 @@ static size_t select_bpf_prog_pack_size(void) >>>> void *ptr; >>>> >>>> size = BPF_HPAGE_SIZE * num_online_nodes(); >>>> - ptr = module_alloc(size); >>>> + ptr = module_alloc_huge(size); >>> >>> This select_bpf_prog_pack_size() function always seemed weird - >>> doing a >>> big allocation and then immediately freeing. Can't it check a >>> config >>> for vmalloc huge page support? >> >> Yes, it is weird. Checking a config is not enough here. We also need >> to >> check vmap_allow_huge, which is controlled by boot parameter >> nohugeiomap. >> I haven’t got a better solution for this. > > It's too weird. We should expose whats needed in vmalloc. > huge_vmalloc_supported() or something. Yeah, this should work. I will get something like this in the next version. > > I'm also not clear why we wouldn't want to use the prog pack allocator > even if vmalloc huge pages was disabled. Doesn't it improve performance > even with small page sizes, per your benchmarks? What is the downside > to just always using it? With current version, when huge page is disabled, the prog pack allocator will use 4kB pages for each pack. We still get about 0.5% performance improvement with 4kB prog packs. Thanks, Song