On Fri, 15 Sept 2023 at 18:30, Lameter, Christopher <cl@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > On Fri, 15 Sep 2023, Dave Hansen wrote: > > > What's the cost? > > The only thing that I see is 1-2% on kernel compilations (and "more on > machines with lots of cores")? I used kernel compilation time (wall clock time) as a benchmark while preparing the series. Lower is better. Intel Skylake, 112 cores: LABEL | COUNT | MIN | MAX | MEAN | MEDIAN | STDDEV ---------------+-------+---------+---------+---------+---------+-------- SLAB_VIRTUAL=n | 150 | 49.700s | 51.320s | 50.449s | 50.430s | 0.29959 SLAB_VIRTUAL=y | 150 | 50.020s | 51.660s | 50.880s | 50.880s | 0.30495 | | +0.64% | +0.66% | +0.85% | +0.89% | +1.79% AMD Milan, 256 cores: LABEL | COUNT | MIN | MAX | MEAN | MEDIAN | STDDEV ---------------+-------+---------+---------+---------+---------+-------- SLAB_VIRTUAL=n | 150 | 25.480s | 26.550s | 26.065s | 26.055s | 0.23495 SLAB_VIRTUAL=y | 150 | 25.820s | 27.080s | 26.531s | 26.540s | 0.25974 | | +1.33% | +2.00% | +1.79% | +1.86% | +10.55% Are there any specific benchmarks that you would be interested in seeing or that are usually used for SLUB? > Problems: > > - Overhead due to more TLB lookups > > - Larger amounts of TLBs are used for the OS. Currently we are trying to > use the maximum mappable TLBs to reduce their numbers. This presumably > means using 4K TLBs for all slab access. Yes, we are using 4K pages for the slab mappings which is going to increase TLB pressure. I also tried writing a version of the patch that uses 2M pages which had slightly better performance, but that had its own problems. For example most slabs are much smaller than 2M, so we would need to create and map multiple slabs at once and we wouldn't be able to release the physical memory until all slabs in the 2M page are unused which increases fragmentation. > - Memory may not be physically contiguous which may be required by some > drivers doing DMA. In the current implementation each slab is backed by physically contiguous memory, but different slabs that are adjacent in virtual memory might not be physically contiguous. Treating objects allocated from two different slabs as one contiguous chunk of memory is probably wrong anyway, right? -- Matteo