On Thu, Sep 06, 2012 at 01:49:03PM +0900, Minchan Kim wrote: > > > __offline_isolated_pages > > > /* > > > * BUG_ON hit or offline page > > > * which is used by someone > > > */ > > > BUG_ON(!PageBuddy(page A)); > > > > > > > offline_page calling BUG_ON because someone allocated the page is > > ridiculous. I did not spot where that check is but it should be changed. The > > correct action is to retry the isolation. > > It is where __offline_isolated_pges. > > .. > while (pfn < end_pfn) { > if (!pfn_valid(pfn)) { > pfn++; > continue; > } > page = pfn_to_page(pfn); > BUG_ON(page_count(page)); > BUG_ON(!PageBuddy(page)); <---- HERE > order = page_order(page); > ... > > Comment of offline_isolated_pages says following as. > > We cannot do rollback at this point > > So if the comment is true, BUG_ON does make sense to me. It's massive overkill. I see no reason why it cannot return EBUSY all the way back up to offline_pages() and retry with the migration step. It would both remove that BUG_ON and improve reliability of memory hot-remove. > But I don't see why we can't retry it as I look thorugh code. > Anyway, It's another story which isn't related to this patch. > True. > > > > > Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx> > > > > At no point in the changelog do you actually say what he patch does :/ > > Argh, I will do. > > > > > > --- > > > mm/page_isolation.c | 5 ++++- > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > > > index acf65a7..4699d1f 100644 > > > --- a/mm/page_isolation.c > > > +++ b/mm/page_isolation.c > > > @@ -196,8 +196,11 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn) > > > continue; > > > } > > > page = pfn_to_page(pfn); > > > - if (PageBuddy(page)) > > > + if (PageBuddy(page)) { > > > + if (get_page_migratetype(page) != MIGRATE_ISOLATE) > > > + break; > > > pfn += 1 << page_order(page); > > > + } > > > > It is possible the page is moved to the MIGRATE_ISOLATE list between when > > the page was freed to the buddy allocator and this check was made. The > > page->index information is stale and the impact is that the hotplug > > operation fails when it could have succeeded. That said, I think it is a > > very unlikely race that will never happen in practice. > > I understand you mean move_freepages which I have missed. Right? Yes. > Then, I will fix it, too. > > > > > More importantly, the effect of this path is that EBUSY gets bubbled all > > the way up and the hotplug operations fails. This is fine but as the page > > is free at the time this problem is detected you also have the option > > of moving the PageBuddy page to the MIGRATE_ISOLATE list at this time > > if you take the zone lock. This will mean you need to change the name of > > test_pages_isolated() of course. > > Sorry, I can't get your point. Could you elaborate it more? You detect a PageBuddy page but it's on the wrong list. Instead of returning and failing memory-hotremove, move the free page to the correct list at the time it is detected. > Is it related to this patch? No, it's not important and was a suggestion on how it could be made better. However, retrying hot-remove would be even better again. I'm not suggesting it be done as part of this series. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>