On Mon, Apr 12, 2010 at 09:18:56AM +0300, Pekka Enberg wrote: > On Mon, Apr 12, 2010 at 9:09 AM, Nick Piggin <npiggin@xxxxxxx> wrote: > >> I think Andrea and Mel and you demonstrated that while defrag is futile in > >> theory (we can always fill up all of RAM with dentries and there's no 2MB > >> allocation possible), it seems rather usable in practice. > > > > One problem is that you need to keep a lot more memory free in order > > for it to be reasonably effective. Another thing is that the problem > > of fragmentation breakdown is not just a one-shot event that fills > > memory with pinned objects. It is a slow degredation. > > > > Especially when you use something like SLUB as the memory allocator > > which requires higher order allocations for objects which are pinned > > in kernel memory. > > I guess we'd need to merge the SLUB defragmentation patches to fix that? No that's a different problem. And SLUB 'defragmentation' isn't really defragmentation, it is just selective reclaim. Reclaimable slab memory allocations are not the problem. The problem are the ones that you can't reclaim. The problem is this: - Memory gets fragmented by allocation of pinned pages within larger ranges so that we cannot allocate that large range. - Anti-frag improves this by putting pinned pages in different ranges and unpinned pages in different ranges. So the ranges of unpinned pages can get reclaimed to use a larger range. - However there is still an underlying problem of pinned pages causing fragmentation within their ranges. - If you require higher order allocations for pinned pages especially, then you will end up with your pinned ranges becoming fragmented and unable to satisfy the higher order allocation. So you must expand your pinned ranges into unpinned. If you only do 4K slab allocations, then things get better, however it can of course still break down if the pinned allocation requirement grows large. It's really hard to control this because it includes anything from open files to radix tree nodes to page tables and anything that any driver or subsystem allocates with kmalloc. Basically, if you were going to add another level of indirection to solve that, you may as well just go ahead and do nonlinear mappings of the kernel memory with page tables, so you'd only have to fix up places that require translated addresses rather than everything that touches KVA. This would still be a big headache. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>