On 09/07/23 14:19, Muchun Song wrote: > > > > On Sep 7, 2023, at 05:38, Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: > > > > On 09/06/23 15:38, Muchun Song wrote: > >> > >> > >> On 2023/9/6 05:44, Mike Kravetz wrote: > >>> Now that batching of hugetlb vmemmap optimization processing is possible, > >>> batch the freeing of vmemmap pages. When freeing vmemmap pages for a > >>> hugetlb page, we add them to a list that is freed after the entire batch > >>> has been processed. > >>> > >>> This enhances the ability to return contiguous ranges of memory to the > >>> low level allocators. > >>> > >>> Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > >>> --- > >>> mm/hugetlb_vmemmap.c | 60 ++++++++++++++++++++++++++++---------------- > >>> 1 file changed, 38 insertions(+), 22 deletions(-) > >>> > >>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c > >>> index 79de984919ef..a715712df831 100644 > >>> --- a/mm/hugetlb_vmemmap.c > >>> +++ b/mm/hugetlb_vmemmap.c > >>> @@ -306,18 +306,21 @@ static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, > >>> * @end: end address of the vmemmap virtual address range that we want to > >>> * remap. > >>> * @reuse: reuse address. > >>> + * @vmemmap_pages: list to deposit vmemmap pages to be freed. It is callers > >>> + * responsibility to free pages. > >>> * > >>> * Return: %0 on success, negative error code otherwise. > >>> */ > >>> static int vmemmap_remap_free(unsigned long start, unsigned long end, > >>> - unsigned long reuse) > >>> + unsigned long reuse, > >>> + struct list_head *vmemmap_pages) > >>> { > >>> int ret; > >>> - LIST_HEAD(vmemmap_pages); > >>> + LIST_HEAD(freed_pages); > >> > >> IIUC, we could reuse the parameter of @vmemmap_pages directly instead of > >> a temporary variable, could it be dropped? > >> > > > > I was concerned about the error case where we call vmemmap_remap_range a > > second time. In the first call to vmemmap_remap_range with vmemmap_remap_pte, > > vmemmap pages to be freed are added to the end of the list (list_add_tail). > > In the call to vmemmap_remap_range after error with vmemmap_restore_pte, > > pages are taken off the head of the list (list_first_entry). So, it seems > > that it would be possible to use a different set of pages in the restore > > Yes. > > > operation. This would be an issue if pages had different characteristics such > > as being on different nodes. Is that a real concern? > > A good point. Now I see your concern, it is better to keep the same node > as before when error occurs. > > > > > I suppose we could change vmemmap_remap_pte to add pages to the head of > > the list? I do not recall the reasoning behind adding to tail. > > I think we could do this, the code will be a little simple. Actually, there > is no reason behind adding to tail (BTW, the first commit is introduced by > me, no secret here :-)). Ok, I will change the way pages are added and removed from the list so that in case of error we get the same pages. Then I can remove the local list. -- Mike Kravetz