On Wed, Jul 04, 2012 at 03:26:15PM +0800, Lai Jiangshan wrote: > > <SNIP> > > Different from ZONE_MOVABLE: it can be used for any given memroyblock. > > Lai Jiangshan (3): > use __rmqueue_smallest when borrow memory from MIGRATE_CMA > add MIGRATE_HOTREMOVE type > add online_movable > > arch/tile/mm/init.c | 2 +- > drivers/acpi/acpi_memhotplug.c | 3 +- > drivers/base/memory.c | 24 +++++++---- > include/linux/memory.h | 1 + > include/linux/memory_hotplug.h | 4 +- > include/linux/mmzone.h | 37 +++++++++++++++++ > include/linux/page-isolation.h | 2 +- > mm/compaction.c | 6 +- > mm/memory-failure.c | 8 +++- > mm/memory_hotplug.c | 36 +++++++++++++--- > mm/page_alloc.c | 86 ++++++++++++++++----------------------- > mm/vmstat.c | 3 + > 12 files changed, 136 insertions(+), 76 deletions(-) > I apologise for my crap review of the first patch to date. It was atrociously bad form and one of the reasons my review was so superficial was because I was aware of the problem below. It's pretty severe, we've encountered it on other occasions and it led me to dismiss the series quickly without adequate explanation or close review when I should have taken the time to explain it. The reason ZONE_MOVABLE exists is because of page reclaim. MIGRATE_CMA or any migrate type that is MIGRATE_CMA-like is not understood by reclaim currently and is not addressed by this series just from looking the diffstat (no changes to vmscan.c). In low memory situations, it's actually fine and the system appears to work well. The problem is either when the MIGRATE_CMA-like area is large and is either completely free or is the only source of pages that can be reclaimed. In these cases, MIGRATE_UNMOVABLE and MIGRATE_RECLAIMABLE allocations fail because their lists and fallback lists are empty. However, if it enters direct reclaim or wakes kswapd the watermarks are fine and reclaim does nothing. Depending on implementation details this causes either a loop or OOM. Minimally the watermark checks need to take the MIGRATE_CMA area into account but even then it is still fragile. If kswapd and direct reclaim are forced to reclaim pages, there is no guarantee they will reclaim pages that are usable by MIGRATE_UNMOVABLE or MIGRATE_RECLAIMABLE. To handle this you must either keep reclaiming pages until it works (easy to implement but disruptive to the system) or scan the LRU lists searching for suitable pages (higher CPU usage, LRU age disruption, will require the entire zone to be scanned in the OOM case which will be slow and subject to races and possible false OOMs). When these situations occur, it is very difficult to debug because it just looks like a hang and the exact triggering situations will be different. If the allocation then fails due to insufficient usable memory, the resulting OOM message will be harder to read because it'll show OOM when there are plenty of pages free. This can be addressed by clear accounting and informative messages of course but to be very clear it might be necessary to walk all the buddy lists to identify how many of the free pages were MIGRATE_CMA. You could use separate accounting of course but then you have accounting and memory overhead instead. In the case of CMA, this issue is less of a problem but it was discussed before CMA was merged. CMAs use case means that it is not likely to suffer severely because of the expected size of the region, how its used and how many slab allocations are expected on the systems it targets. It's far worse for memory hotplug because if the bulk of your memory is memory hotplugged, you may not be able to use it for metadata-intensive workloads for example which will result in bug reports. You could have 90% free memory and be unable to use any of it because you cannot increase the size of slab leading to odd corner cases. ZONE_MOVABLE is not great, but it handles the reclaim issues in a straight-forward manner, OOM is handled quickly because the whole system does not have to be scanned to detect the situation and the OOM messages are easy to read. If you want to replace it with MIGRATE_CMA or something MIGRATE_CMA-like, you need to take these issues into account or at the very least explain in detail why it is not an issue. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html