On Tue, Apr 13, 2010 at 01:38:25PM +0200, Ingo Molnar wrote: > > * Andrea Arcangeli <aarcange@xxxxxxxxxx> wrote: > > > On Mon, Apr 12, 2010 at 04:22:30AM -0700, Arjan van de Ven wrote: > > > > > > Now hugepages have some interesting other advantages, namely they save > > > pagetable memory..which for something like TPC-C on a fork based database > > > can be a measureable win. > > > > It doesn't save pagetable memory (as in `grep MemFree /proc/meminfo`). [...] > > It does save in terms of CPU cache footprint. (which the argument was about) > The RAM is wasted, but are always cache cold. Definitely, thanks for further clarifying this, and this is why I've been careful to specify "as in `grep MemFree..". > i think it's very much interesting for 'pure' hugetlb mappings, as a next-step > thing. It amounts to 8 bytes wasted per 4K page [0.2% of RAM wasted] - much > more with the kind of aliasing that DBs frequently do - for hugetlb workloads > it is basically roughly equivalent to a +8 bytes increase in struct page size > - few MM hackers would accept that. > > So it will have to be fixed down the line. It's exactly 4k wasted for each pmd set as pmd_trans_huge. Removing the pagetable preallocation will be absolutely trivial as far as huge_memory.c is concerned (takes like 1 minute of hacking) and in fact it simplifies a bit of the code, what will be not trivial will be to handle the -ENOMEM retval from every place that calls split_huge_page_pmd, which definitely we can address down the line (ideally by removing split_huge_page_pmd). The other benefit the current preallocation provides, is that it doesn't increase requirements from the PF_MEMALLOC pool, until we can swap hugepages natively with huge-swapcache, in order to swap we need to allocate the pte. Who tried this before (Dave IIRC) answered some email ago that he also had to preallocate the pte to avoid running into the above issue. When he said that, it further confirmed me that it's worth to go this way initially. Also note: we're not wasting memory compared to when pmd is not huge, we just don't take advantage of the full potential of hugepages to keep things more manageable initially. Thanks, Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>