Hi Mel, On 02/05/2013 09:32 PM, Mel Gorman wrote: > On Tue, Feb 05, 2013 at 11:57:22AM +0000, Mel Gorman wrote: >> >>>> + migrate_pre_flag = 1; >>>> + } >>>> + >>>> + if (!isolate_lru_page(pages[i])) { >>>> + inc_zone_page_state(pages[i], NR_ISOLATED_ANON + >>>> + page_is_file_cache(pages[i])); >>>> + list_add_tail(&pages[i]->lru, &pagelist); >>>> + } else { >>>> + isolate_err = 1; >>>> + goto put_page; >>>> + } >> >> isolate_lru_page() takes the LRU lock every time. > > Credit to Michal Hocko for bringing this up but with the number of > other issues I missed that this is also broken with respect to huge page > handling. hugetlbfs pages will not be on the LRU so the isolation will mess > up and the migration has to be handled differently. Ordinarily hugetlbfs > pages cannot be allocated from ZONE_MOVABLE but it is possible to configure > it to be allowed via /proc/sys/vm/hugepages_treat_as_movable. If this > encounters a hugetlbfs page, it'll just blow up. I look into the migrate_huge_page() codes find that if we support the hugetlbfs non movable migration, we have to invent another alloc_huge_page_node_nonmovable() or such allocate interface, which cost is large(exploding the codes and great impact on current alloc_huge_page_node()) but gains little, I think that pinning hugepage is a corner case. So can we skip over hugepage without migration but give some WARN_ON() info, is it acceptable? > > The other is that this almost certainly broken for transhuge page > handling. gup returns the head and tail pages and ordinarily this is ok I can't find codes doing such things :(, could you please point me out? > because the caller only cares about the physical address. Migration will > also split a hugepage if it receives it but you are potentially adding > tail pages to a list here and then migrating them. The split of the first > page will get very confused. I'm not exactly sure what the result will be > but it won't be pretty. > > Was THP enabled when this was tested? Was CONFIG_DEBUG_LIST enabled > during testing? I checked my config file that both CONFIG options aboved are enabled. However it was only be tested by two services invoking io_setup(), it works fine.. thanks, linfeng -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html