On Fri, Mar 26, 2010 at 05:36:55PM +0000, Mel Gorman wrote: > Correct, slab pages currently cannot migrate. Framentation within slab > is minimised by anti-fragmentation by distinguishing between reclaimable > and unreclaimable slab and grouping them appropriately. The objective is > to put all the unmovable pages in as few 2M (or 4M or 16M) pages as > possible. If min_free_kbytes is tuned as hugeadm > --recommended-min_free_kbytes suggests, this works pretty well. Awesome. So this feature is already part of your memory compaction code? As you may have noticed I didn't start looking deep on your code yet. > Again, if min_free_kbytes is tuned appropriately, anti-frag should > mitigate most of the fragmentation-related damage. I don't see the relation of why this logic should be connected to min_free_kbytes. Maybe I'll get it if I read the code. But min_free_kbytes is about the PF_MEMALLOC pool and GFP_ATOMIC memory. I can't see any connection with min_free_kbytes setting, and in to trying to keep all non relocatable entries in the same HPAGE_PMD_SIZEd pages. > On the notion of having a 2M front slab allocator, SLUB is not far off > being capable of such a thing but there are risks. If a 2M page is > dedicated to a slab, then other slabs will need their own 2M pages. > Overall memory usage grows and you end up worse off. > > If you suggest that slab uses 2M pages and breaks them up for slabs, you > are very close to what anti-frag already does. The difference might be That's exactly what I meant yes. Doing it per-slab would be useless. The idea was for slub to simply call alloc_page_not_relocatable(order) instead of alloc_page() every time it allocates an order <= HPAGE_PMD_ORDER. That means this 2M page would be shared for _all_ slabs, otherwise it wouldn't work. The page freeing could even go back in the buddy initially. So the max waste would be 2M per cpu of ram (the front page has to be per-cpu to perform). > that slab would guarantee that the 2M page is only use for slab. Again, > you could force this situation with anti-frag but the decision was made > to allow a certain amount of fragmentation to avoid the memory overhead > of such a thing. Again, tuning min_free_kbytes + anti-fragmentation gets > much of what you need. Well if this 2M page is shared by other not relocatable entities that might be even better in some scenario (maybe worse in others) but I'm totally fine with a more elaborate approach. Clearly some driver could also start to call alloc_pages_not_relocatable() and then it'd also share the same memory as slab. I think it has to be an universally available feature, just like you implemented. Except right now the main problem is slab so that's the first user for sure ;). > Arguably, min_free_kbytes should be tuned appropriately once it's detected > that huge pages are in use. It would not be hard at all, we just don't do it. > > Stronger guarantees on layout are possible but not done today because of > the cost. Could you elaborate what "guarantees of layout" means? > > > Basically the buddy allocator will guarantee the slab will > > generate as much fragement as possible because it does its best to keep the > > high order pages for who asks for them. > > Again, already does this up to a point. rmqueue_fallback() could refuse to > break up small contiguous pages for slab to force better layout in terms of > fragmentation but it costs heavily when memory is low because you now have to > reclaim (or relocate) more pages than necessary to satisfy anti-fragmentation. I guess this will require a sysfs control. Do you have a /sys/kernel/mm/defrag directory or something? If hugepages are absolutely mandatory (like with hypervisor-only usage) it is worth invoking memory compaction to satisfy what i call "front allocator" and give a full 2M page to slab instead of using the already available fragment. And to rmqueue-fallback only if defrag fails. > Sounds very similar to anti-frag again. Indeed. > You could force such a situation by always having X number of lower blocks > MIGRATE_UNMOVABLE and forcing a situation where fallback never happens to those > areas. You'd need to do some juggling with counters and watermarks. It's not > impossible and I considered doing it when anti-fragmentation was introduced > but again, there was insufficient data to support such a move. Agreed. I also like a more dynamic approach, the whole idea of transparent hugepage is that the admin does nothing, no reservation, and in this case no decision of how much memory to be MIGRATE_UNMOVABLE. Looking forward to see transparent hugepage taking full advantage of your patchset! Thanks, Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>