On Tue, Sep 29, 2020 at 04:05:29PM +0300, Mike Rapoport wrote: > On Fri, Sep 25, 2020 at 09:41:25AM +0200, Peter Zijlstra wrote: > > On Thu, Sep 24, 2020 at 04:29:03PM +0300, Mike Rapoport wrote: > > > From: Mike Rapoport <rppt@xxxxxxxxxxxxx> > > > > > > Removing a PAGE_SIZE page from the direct map every time such page is > > > allocated for a secret memory mapping will cause severe fragmentation of > > > the direct map. This fragmentation can be reduced by using PMD-size pages > > > as a pool for small pages for secret memory mappings. > > > > > > Add a gen_pool per secretmem inode and lazily populate this pool with > > > PMD-size pages. > > > > What's the actual efficacy of this? Since the pmd is per inode, all I > > need is a lot of inodes and we're in business to destroy the directmap, > > no? > > > > Afaict there's no privs needed to use this, all a process needs is to > > stay below the mlock limit, so a 'fork-bomb' that maps a single secret > > page will utterly destroy the direct map. > > This indeed will cause 1G pages in the direct map to be split into 2M > chunks, but I disagree with 'destroy' term here. Citing the cover letter > of an earlier version of this series: It will drop them down to 4k pages. Given enough inodes, and allocating only a single sekrit page per pmd, we'll shatter the directmap into 4k. > I've tried to find some numbers that show the benefit of using larger > pages in the direct map, but I couldn't find anything so I've run a > couple of benchmarks from phoronix-test-suite on my laptop (i7-8650U > with 32G RAM). Existing benchmarks suck at this, but FB had a load that had a deterministic enough performance regression to bisect to a directmap issue, fixed by: 7af0145067bc ("x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text") > I've tested three variants: the default with 28G of the physical > memory covered with 1G pages, then I disabled 1G pages using > "nogbpages" in the kernel command line and at last I've forced the > entire direct map to use 4K pages using a simple patch to > arch/x86/mm/init.c. I've made runs of the benchmarks with SSD and > tmpfs. > > Surprisingly, the results does not show huge advantage for large > pages. For instance, here the results for kernel build with > 'make -j8', in seconds: Your benchmark should stress the TLB of your uarch, such that additional pressure added by the shattered directmap shows up. And no, I don't have one either. > | 1G | 2M | 4K > ----------------------+--------+--------+--------- > ssd, mitigations=on | 308.75 | 317.37 | 314.9 > ssd, mitigations=off | 305.25 | 295.32 | 304.92 > ram, mitigations=on | 301.58 | 322.49 | 306.54 > ram, mitigations=off | 299.32 | 288.44 | 310.65 These results lack error data, but assuming the reults are significant, then this very much makes a case for 1G mappings. 5s on a kernel builds is pretty good.