On Wed, 22 Dec 2021 17:15:18 +0800 Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> wrote: > > > On 12/22/2021 4:43 PM, SeongJae Park wrote: > > On Tue, 21 Dec 2021 22:18:06 +0800 Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> wrote: > > > >> Hi SeongJae, > >> > >> On 12/21/2021 9:24 PM, SeongJae Park Wrote: > >>> Hi Baolin, > >>> > >>> > >>> Thank you for this great patch! I left some comments below. > >>> > >>> On Tue, 21 Dec 2021 17:18:04 +0800 Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> wrote: > >>> > >>>> On tiered memory system, the reclaim path in shrink_page_list() already > >>>> support demoting pages to slow memory node instead of discarding the > >>>> pages. However, at that time the fast memory node memory wartermark is > >>>> already tense, which will increase the memory allocation latency during > >>>> page demotion. > >>>> > >>>> We can rely on the DAMON in user space to help to monitor the cold > >>>> memory on fast memory node, and demote the cold pages to slow memory > >>>> node proactively to keep the fast memory node in a healthy state. > >>>> Thus this patch introduces a new scheme named DAMOS_DEMOTE to support > >>>> this feature. > >>>> > >>>> Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> > >>>> --- > > [snip] > > >> > >>> > >>>> + list_add(&page->lru, &pagelist); > >>>> + target_nid = next_demotion_node(page_to_nid(page)); > >>>> + nr_pages = thp_nr_pages(page); > >>>> + > >>>> + ret = migrate_pages(&pagelist, alloc_demote_page, NULL, > >>>> + target_nid, MIGRATE_SYNC, MR_DEMOTION, > >>>> + &nr_succeeded); > >>>> + if (ret) { > >>>> + if (!list_empty(&pagelist)) { > >>>> + list_del(&page->lru); > >>>> + mod_node_page_state(page_pgdat(page), > >>>> + NR_ISOLATED_ANON + page_is_file_lru(page), > >>>> + -nr_pages); > >>>> + putback_lru_page(page); > >>>> + } > >>>> + } else { > >>>> + __count_vm_events(PGDEMOTE_DIRECT, nr_succeeded); > >>>> + demoted_pages += nr_succeeded; > >>>> + } > >>> > >>> Why should we do above work for each page on our own instead of exporting and > >>> calling 'demote_page_list()'? > >> > >> Cuase demote_page_list() is used for demote cold pages which are from > >> one same fast memory node, they can have one same target demotion node. > >> > >> But for the regions for the process, we can get some cold pages from > >> different fast memory nodes (suppose we have mulptiple dram node on the > >> system), so its target demotion node may be different. Thus we should > >> migrate each cold pages with getting the correct target demotion node. > > > > Thank you for the kind explanation. But, I still want to reuse the code rather > > than copying if possible. How about below dumb ideas off the top of my head? > > > > 1. Call demote_page_list() for each page from here > > Sounds reasonable. > > > 2. Call demote_page_list() for each page from damos_migrate_pmd_entry() > > We are under mmap lock in damos_migrate_pmd_entry(), it is not suitable > to do page migration. > > > 3. Make damos_migrate_pages_walk_ops to configure multiple demote_pages lists, > > each per fast memory node, and calling demote_page_list() here for each > > per-fast-memory-node demote_pages list. > > Which will bring more complexity I think, and we should avoid doing > migration under mmap lock. > > > 4. Make demote_page_list() handles this case and use it. e.g., NULL pgdat > > parameter means the pages are not from same node. > > Thanks for your suggestion, actually after more thinking, I think we can > reuse the demote_page_list() and it can be easy to change. Somethink > like below on top of my current patch, how do you think? Thanks. So, you selected the option 1. I personally think option 3 or 4 would be more optimal, but also agree that it could increase the complexity. As we already have the time/space quota feature for schemes overhead control, I think starting with this simple approach makes sense to me. Thanks, SJ [...]