On Wed, Aug 22, 2012 at 03:57:45PM +0800, qiuxishi wrote: > On 2012-8-22 11:34, Minchan Kim wrote: > > Hello Xishi, > > > > On Tue, Aug 21, 2012 at 08:12:05PM +0800, qiuxishi wrote: > >> From: Xishi Qiu <qiuxishi@xxxxxxxxxx> > >> > >> When offline a section, we move all the free pages and pcp into MIGRATE_ISOLATE list first. > >> start_isolate_page_range() > >> set_migratetype_isolate() > >> drain_all_pages(), > >> > >> Here is a problem, it is not sure that pcp will be moved into MIGRATE_ISOLATE list. They may > >> be moved into MIGRATE_MOVABLE list because page_private() maybe 2. So when finish migrating > >> pages, the free pages from pcp may be allocated again, and faild in check_pages_isolated(). > >> drain_all_pages() > >> drain_local_pages() > >> drain_pages() > >> free_pcppages_bulk() > >> __free_one_page(page, zone, 0, page_private(page)); > >> > >> If we add move_freepages_block() after drain_all_pages(), it can not sure that all the pcp > >> will be moved into MIGRATE_ISOLATE list when the system works on high load. The free pages > >> which from pcp may immediately be allocated again. > >> > >> I think the similar bug described in http://marc.info/?t=134250882300003&r=1&w=2 > > > > Yes. I reported the problem a few month ago but it's not real bug in practice > > but found by my eyes during looking the code so I wanted to confirm the problem. > > > > Do you find that problem in real practice? or just code review? > > > > I use /sys/devices/system/memory/soft_offline_page to offline a lot of pages when the > system works on high load, then I find some unknown zero refcount pages, such as > get_any_page: 0x650422: unknown zero refcount page type 19400c00000000 > get_any_page: 0x650867: unknown zero refcount page type 19400c00000000 > > soft_offline_page() > get_any_page() > set_migratetype_isolate() > drain_all_pages() > > I think after drain_all_pages(), pcp are moved into MIGRATE_MOVABLE list which managed by > buddy allocator, but they are allocated and becaome pcp again as the system works on high > load. There will be no this problem by applying this patch. > > > Anyway, I don't like your approach which I already considered because it hurts hotpath > > while the race is really unlikely. Get_pageblock_migratetype is never trivial. > > We should avoid the overhead in hotpath and move into memory-hotplug itself. > > Do you see my patch in https://patchwork.kernel.org/patch/1225081/ ? > > Yes, you are right, I will try to find another way to fix this problem. > How about doing this work in set_migratetype_isolate(), find the pcp and change the value > of private to get_pageblock_migratetype(page)? > Allocator doesn't have any lock when he allocates the page from pcp. How could you prevent race between allocator and memory-hotplug routine(ie, set_migratetype_isolate) without hurting hotpath? -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>