On Tue, 21 Dec 2021 22:18:06 +0800 Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> wrote: > Hi SeongJae, > > On 12/21/2021 9:24 PM, SeongJae Park Wrote: > > Hi Baolin, > > > > > > Thank you for this great patch! I left some comments below. > > > > On Tue, 21 Dec 2021 17:18:04 +0800 Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> wrote: > > > >> On tiered memory system, the reclaim path in shrink_page_list() already > >> support demoting pages to slow memory node instead of discarding the > >> pages. However, at that time the fast memory node memory wartermark is > >> already tense, which will increase the memory allocation latency during > >> page demotion. > >> > >> We can rely on the DAMON in user space to help to monitor the cold > >> memory on fast memory node, and demote the cold pages to slow memory > >> node proactively to keep the fast memory node in a healthy state. > >> Thus this patch introduces a new scheme named DAMOS_DEMOTE to support > >> this feature. > >> > >> Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> > >> --- [...] > >> + */ > >> +static int damos_demote(struct damon_target *target, struct damon_region *r) > >> +{ > >> + struct mm_struct *mm; > >> + LIST_HEAD(demote_pages); > >> + LIST_HEAD(pagelist); > >> + int target_nid, nr_pages, ret = 0; > >> + unsigned int nr_succeeded, demoted_pages = 0; > >> + struct page *page, *page2; > >> + > >> + /* Validate if allowing to do page demotion */ > >> + if (!numa_demotion_enabled) > >> + return -EINVAL; > >> + > >> + mm = damon_get_mm(target); > >> + if (!mm) > >> + return -ENOMEM; > >> + > >> + mmap_read_lock(mm); > >> + walk_page_range(mm, PAGE_ALIGN(r->ar.start), PAGE_ALIGN(r->ar.end), [...] > >> + &damos_migrate_pages_walk_ops, &demote_pages); > >> + mmap_read_unlock(mm); > >> + > >> + mmput(mm); > >> + if (list_empty(&demote_pages)) > >> + return 0; > >> + > >> + list_for_each_entry_safe(page, page2, &demote_pages, lru) { > > > > I'd prefer 'next' or 'next_page' instead of 'page2'. > > Sure. > > > > >> + list_add(&page->lru, &pagelist); > >> + target_nid = next_demotion_node(page_to_nid(page)); > >> + nr_pages = thp_nr_pages(page); > >> + > >> + ret = migrate_pages(&pagelist, alloc_demote_page, NULL, > >> + target_nid, MIGRATE_SYNC, MR_DEMOTION, > >> + &nr_succeeded); > >> + if (ret) { > >> + if (!list_empty(&pagelist)) { > >> + list_del(&page->lru); > >> + mod_node_page_state(page_pgdat(page), > >> + NR_ISOLATED_ANON + page_is_file_lru(page), > >> + -nr_pages); > >> + putback_lru_page(page); > >> + } > >> + } else { > >> + __count_vm_events(PGDEMOTE_DIRECT, nr_succeeded); > >> + demoted_pages += nr_succeeded; > >> + } > > > > Why should we do above work for each page on our own instead of exporting and > > calling 'demote_page_list()'? > > Cuase demote_page_list() is used for demote cold pages which are from > one same fast memory node, they can have one same target demotion node. > > But for the regions for the process, we can get some cold pages from > different fast memory nodes (suppose we have mulptiple dram node on the > system), so its target demotion node may be different. Thus we should > migrate each cold pages with getting the correct target demotion node. Thank you for the kind explanation. But, I still want to reuse the code rather than copying if possible. How about below dumb ideas off the top of my head? 1. Call demote_page_list() for each page from here 2. Call demote_page_list() for each page from damos_migrate_pmd_entry() 3. Make damos_migrate_pages_walk_ops to configure multiple demote_pages lists, each per fast memory node, and calling demote_page_list() here for each per-fast-memory-node demote_pages list. 4. Make demote_page_list() handles this case and use it. e.g., NULL pgdat parameter means the pages are not from same node. Please let me know if I'm missing something. Thanks, SJ > > Thanks for your comments.