Re: [PATCH bpf-next v2 0/5] execmem_alloc for BPF programs

Mike Rapoport <rppt@xxxxxxxxxx> · Sat, 3 Dec 2022 16:46:14 +0200

Hi Thomas,

On Thu, Dec 01, 2022 at 11:34:57PM +0100, Thomas Gleixner wrote:
> Mike!
> 
> On Thu, Dec 01 2022 at 22:23, Mike Rapoport wrote:
> > On Thu, Dec 01, 2022 at 10:08:18AM +0100, Thomas Gleixner wrote:
> >> On Wed, Nov 30 2022 at 08:18, Song Liu wrote:
> >> The symptom is iTLB pressure. The root cause is the way how module
> >> memory is allocated, which in turn causes the fragmentation into
> >> 4k PTEs. That's the same problem for anything which uses module_alloc()
> >> to get space for text allocated, e.g. kprobes, tracing....
> >
> > There's also dTLB pressure caused by the fragmentation of the direct map.
> > The memory allocated with module_alloc() is a priori mapped with 4k PTEs,
> > but setting RO in the malloc address space also updates the direct map
> > alias and this causes splits of large pages.
> >
> > It's not clear what causes more performance improvement: avoiding splits of
> > large pages in the direct map or reducing iTLB pressure by backing text
> > memory with 2M pages.
> 
> From our experiments when doing the first version of the SKX retbleed
> mitigation, the main improvement came from reducing iTLB pressure simply
> because the iTLB cache is really small.
> 
> The kernel text placement is way beyond suboptimal. If you really do a
> hotpath analysis and (manually) place all hot code into one or two 2M
> pages, then you can achieve massive performance improvements way above
> the 10% range.
> 
> We currently have a master student investigating this, but it will take
> some time until usable results materialize.
> 
> > If the major improvement comes from keeping direct map intact, it's
> > might be possible to mix data and text in the same 2M page.
> 
> No. That can't work.
> 
>     text = RX
>     data = RW or RO
> 
> If you mix this, then you end up with RWX for the whole 2M page. Not an
> option really as you lose _all_ protections in one go.

I meant to take one 2M page from the direct map and split it to 4K in the
module address space. Then the protection could be done at PTE level after
relocations etc and it would save the dance with text poking. But if
mapping the code with 2M pages gives massive performance improvements,
it's surely better to keep 2M pages in the modules space.

> Thanks,
> 
>         tglx

-- 
Sincerely yours,
Mike.