On Mon, 6 Sep 2010 00:57:53 +0900 Minchan Kim <minchan.kim@xxxxxxxxx> wrote: > > > > Thanks, > > -Kame > > == > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> > > > > This patch as a memory allocator for contiguous memory larger than MAX_ORDER. > > > > alloc_contig_pages(hint, size, list); > > > > This function allocates 'size' of contigoues pages, whose physical address > > is higher than 'hint'. size is specicied in byte unit. > > size is byte, hint is pfn? > hint is physical address. What's annoying me is x86-32, should I use physaddr_t or pfn .... > > Allocated pages are all linked into the list and all of their page_count() > > are set to 1. Return value is the top page. > > > > free_contig_pages(list) > > returns all pages in the list. > > > > This patch does > > - find an area which can be ISOLATED. > > - migrate remaining pages in the area. > > Migrate from there to where? > somewhere. > > - steal chunk of pages from allocator. > > > > Limitation is: > > - retruned pages will be aligend to MAX_ORDER. > > - returned length of page will be aligned to MAX_ORDER. > > (so, the caller may have to return tails of pages by itself.) > > What do you mean tail? > Ah, the allocator returns MAX_ORDER aligned pages, then, [xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyyyyyyyy] x+y = allocated x = will be used. y = will be unsused. I call 'y' as tail, here. > > - may allocate contiguous pages which overlap node/zones. > > Hmm.. Do we really need this? > Unnecessary. please consider this as BUG. This code just check pfn of allocated area but doesn't check which zone/node the pfn is tied to. For example, I hear IBM has following kind of memory layout. | Node0 | Node1 | Node2 | Node0 | Node2 | Node1| ..... So, some check should be added to avoid to allocate chunk of pages spreads out to multiple nodes. (I hope walk_page_range() can do enough jobs for us, but I'm not sure. I need to add "zone" check, at least) > > > > This is fully experimental and written as example. > > (Maybe need more patches to make this complete.) > > Yes. But first impression of this patch is good to me. > > > > > This patch moves some amount of codes from memory_hotplug.c to > > page_isolation.c and based on page-offline technique used by > > memory_hotplug.c > > > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> > > --- > > include/linux/page-isolation.h | 10 + > > mm/memory_hotplug.c | 84 -------------- > > mm/page_alloc.c | 32 +++++ > > mm/page_isolation.c | 244 +++++++++++++++++++++++++++++++++++++++++ > > 4 files changed, 287 insertions(+), 83 deletions(-) > > > > Index: mmotm-0827/mm/page_isolation.c > > =================================================================== > > --- mmotm-0827.orig/mm/page_isolation.c > > +++ mmotm-0827/mm/page_isolation.c > > @@ -3,8 +3,11 @@ > > */ > > > > #include <linux/mm.h> > > +#include <linux/swap.h> > > #include <linux/page-isolation.h> > > #include <linux/pageblock-flags.h> > > +#include <linux/mm_inline.h> > > +#include <linux/migrate.h> > > #include "internal.h" > > > > static inline struct page * > > @@ -140,3 +143,244 @@ int test_pages_isolated(unsigned long st > > spin_unlock_irqrestore(&zone->lock, flags); > > return ret ? 0 : -EBUSY; > > } > > + > > +#define CONTIG_ALLOC_MIGRATION_RETRY (5) > > + > > +/* > > + * Scanning pfn is much easier than scanning lru list. > > + * Scan pfn from start to end and Find LRU page. > > + */ > > +unsigned long scan_lru_pages(unsigned long start, unsigned long end) > > +{ > > + unsigned long pfn; > > + struct page *page; > > + for (pfn = start; pfn < end; pfn++) { > > + if (pfn_valid(pfn)) { > > + page = pfn_to_page(pfn); > > + if (PageLRU(page)) > > + return pfn; > > + } > > + } > > + return 0; > > +} > > + > > +/* Migrate all LRU pages in the range to somewhere else */ > > +static struct page * > > +hotremove_migrate_alloc(struct page *page, unsigned long private, int **x) > > +{ > > + /* This should be improooooved!! */ > > Yeb. > > > + return alloc_page(GFP_HIGHUSER_MOVABLE); > > +} > > <snip> > > > +struct page *alloc_contig_pages(unsigned long long hint, > > + unsigned long size, struct list_head *list) > > +{ > > + unsigned long base, found, end, pages, start; > > + struct page *ret = NULL; > > + int nid, retry; > > + > > + if (hint) > > + hint = ALIGN(hint, MAX_ORDER_NR_PAGES); > > + /* request size should be aligned to pageblock */ > > + size >>= PAGE_SHIFT; > > + pages = ALIGN(size, MAX_ORDER_NR_PAGES); > > + found = 0; > > +retry: > > + for_each_node_state(nid, N_HIGH_MEMORY) { > > + unsigned long node_end; > > + pg_data_t *node = NODE_DATA(nid); > > + > > + node_end = node->node_start_pfn + node->node_spanned_pages; > > + /* does this node have proper range of memory ? */ > > + if (node_end < hint + pages) > > + continue; > > + base = hint; > > + if (base < node->node_start_pfn) > > + base = node->node_start_pfn; > > + > > + base = ALIGN(base, MAX_ORDER_NR_PAGES); > > + found = 0; > > + end = node_end & ~(MAX_ORDER_NR_PAGES -1); > > + /* Maybe we can use this Node */ > > + if (base + pages < end) > > + found = __find_contig_block(base, end, pages); > > + if (found) /* Found ? */ > > + break; > > + base = hint; > > + } > > + if (!found) > > + goto out; > > + /* > > + * Ok, here, we have contiguous pageblock marked as "isolated" > > + * try migration. > > + */ > > + retry = CONTIG_ALLOC_MIGRATION_RETRY; > > + end = found + pages; > > Hmm.. I can't understand below loop. > Maybe need refactoring. > yes. Just moved from memory hotplug code. > > + for (start = scan_lru_pages(found, end); start < end;) { > > + > > + if (do_migrate_range(found, end)) { > > + /* migration failure ... */ > > + if (retry-- < 0) > > + break; > > + /* take a rest and synchronize LRU etc. */ > > + lru_add_drain_all(); > > + flush_scheduled_work(); > > + cond_resched(); > > + drain_all_pages(); > > + } > > + start = scan_lru_pages(start, end); > > + if (!start) > > + break; > > + } > > <snip> > > > +void alloc_contig_freed_pages(unsigned long pfn, unsigned long end, > > + struct list_head *list) > > +{ > > + struct page *page; > > + struct zone *zone; > > + int i, order; > > + > > + zone = page_zone(pfn_to_page(pfn)); > > + spin_lock_irq(&zone->lock); > > + while (pfn < end) { > > + VM_BUG_ON(!pfn_valid(pfn)); > > + page = pfn_to_page(pfn); > > + VM_BUG_ON(page_count(page)); > > + VM_BUG_ON(!PageBuddy(page)); > > + list_del(&page->lru); > > + order = page_order(page); > > + zone->free_area[order].nr_free--; > > + rmv_page_order(page); > > + __mod_zone_page_state(zone, NR_FREE_PAGES, - (1UL << order)); > > + for (i = 0;i < (1 << order); i++) { > > + struct page *x = page + i; > > + list_add(&x->lru, list); > > + } > > + page += 1 << order; > ^ pfn? > yes, the patch on my tree has "pfn"....refresh miss :(. > > + } > > + spin_unlock_irq(&zone->lock); > > + > > + /*After this, pages on the list can be freed one be one */ > > + list_for_each_entry(page, list, lru) > > + prep_new_page(page, 0, 0); > > +} > > + > > #ifdef CONFIG_MEMORY_HOTREMOVE > > /* > > * All pages in the range must be isolated before calling this. > > > Thank you for review. Regards, -Kame -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html