On Fri, 2012-11-23 at 09:56 +0000, Jan Beulich wrote: > >>> On 22.11.12 at 18:37, "H. Peter Anvin" <hpa at zytor.com> wrote: > > I actually talked to Ian Jackson at LCE, and mentioned among other That was me actually (this happens surprisingly often ;-)). > > things the bogosity of requiring a PUD page for three-level paging in > > Linux -- a bogosity which has spread from Xen into native. It's a page > > wasted for no good reason, since it only contains 32 bytes worth of > > data, *inherently*. Furthermore, contrary to popular belief, it is > > *not* pa page table per se. > > > > Ian told me: "I didn't know we did that, and we shouldn't have to." > > Here we have suffered this overhead for at least six years, ... > > Even the Xen kernel only needs the full page when running on a > 64-bit hypervisor (now that we don't have a 32-bit hypervisor > anymore, that of course basically means always). I took an, admittedly very brief, look at it on the plane on the way home and it seems like the requirement for a complete page on the pvops-xen side comes from the !SHARED_KERNEL_PMD stuff (so still a Xen related thing). This requires a struct page for the list_head it contains (see pgd_list_add et al) rather than because of the use of the page as a pgd as such. > But yes, I too > never liked this enforced over-allocation for native kernels (and > was surprised that it was allowed in at all). Completely agreed. I did wonder if just doing something like: - pgd = (pgd_t *)__get_free_page(PGALLOC_GFP); + if (SHARED_KERNEL_PMD) + pgd = some_appropriate_allocation_primitive(sizeof(*pgd)); + else + pgd = (pgd_t *)__get_free_page(PGALLOC_GFP); to pgd_alloc (+ the equivalent for the error path & free case, create helper funcs as desired etc) would be sufficient to remove the over allocation for the native case but haven't had time to properly investigate. Alternatively push the allocation down into paravirt_pgd_alloc to taste :-/ Ian.