On Fri, Nov 19, 2021 at 04:14:46AM +0000, Song Liu wrote: > > > > On Nov 18, 2021, at 10:58 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > > On Thu, Nov 18, 2021 at 06:39:49PM +0000, Song Liu wrote: > > > >>> You're going to have to do that anyway if you're going to write to the > >>> directmap while executing from the alias. > >> > >> Not really. If you look at current version 7/7, the logic is mostly > >> straightforward. We just make all the writes to the directmap, while > >> calculate offset from the alias. > > > > Then you can do the exact same thing but do the writes to a temp buffer, > > no different. > > There will be some extra work, but I guess I will give it a try. > > > > >>>> The BPF program could have up to 1000000 (BPF_COMPLEXITY_LIMIT_INSNS) > >>>> instructions (BPF instructions). So it could easily go beyond a few > >>>> pages. Mapping the 2MB page all together should make the logic simpler. > >>> > >>> Then copy it in smaller chunks I suppose. > >> > >> How fast/slow is the __text_poke routine? I guess we cannot do it thousands > >> of times per BPF program (in chunks of a few bytes)? > > > > You can copy in at least 4k chunks since any 4k will at most use 2 > > pages, which is what it does. If that's not fast enough we can look at > > doing bigger chunks. > > If we do JIT in a buffer first, 4kB chunks should be fast enough. > > Another side of this issue is the split of linear mapping (1GB => > many 4kB). If we only split to PMD, but not PTE, we can probably > recover most of the regression. I will check this with Johannes. __text_poke() shouldn't affect the fragmentation of the kernel mapping, it's a user-space alias into the same physical memory. For all it cares we're poking into GB pages.