> On 12/21/20 11:46 PM, Liang Li wrote: > > Free page reporting only supports buddy pages, it can't report the > > free pages reserved for hugetlbfs case. On the other hand, hugetlbfs > > is a good choice for a system with a huge amount of RAM, because it > > can help to reduce the memory management overhead and improve system > > performance. > > This patch add the support for reporting hugepages in the free list > > of hugetlb, it canbe used by virtio_balloon driver for memory > > overcommit and pre zero out free pages for speeding up memory population. > > My apologies as I do not follow virtio_balloon driver. Comments from > the hugetlb perspective. Any comments are welcome. > > static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid) > > @@ -5531,6 +5537,29 @@ follow_huge_pgd(struct mm_struct *mm, unsigned long address, pgd_t *pgd, int fla > > return pte_page(*(pte_t *)pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT); > > } > > > > +bool isolate_free_huge_page(struct page *page, struct hstate *h, int nid) > > Looks like this always returns true. Should it be type void? will change in the next revision. > > +{ > > + bool ret = true; > > + > > + VM_BUG_ON_PAGE(!PageHead(page), page); > > + > > + list_move(&page->lru, &h->hugepage_activelist); > > + set_page_refcounted(page); > > + h->free_huge_pages--; > > + h->free_huge_pages_node[nid]--; > > + > > + return ret; > > +} > > + > > ... > > +static void > > +hugepage_reporting_drain(struct page_reporting_dev_info *prdev, > > + struct hstate *h, struct scatterlist *sgl, > > + unsigned int nents, bool reported) > > +{ > > + struct scatterlist *sg = sgl; > > + > > + /* > > + * Drain the now reported pages back into their respective > > + * free lists/areas. We assume at least one page is populated. > > + */ > > + do { > > + struct page *page = sg_page(sg); > > + > > + putback_isolate_huge_page(h, page); > > + > > + /* If the pages were not reported due to error skip flagging */ > > + if (!reported) > > + continue; > > + > > + __SetPageReported(page); > > + } while ((sg = sg_next(sg))); > > + > > + /* reinitialize scatterlist now that it is empty */ > > + sg_init_table(sgl, nents); > > +} > > + > > +/* > > + * The page reporting cycle consists of 4 stages, fill, report, drain, and > > + * idle. We will cycle through the first 3 stages until we cannot obtain a > > + * full scatterlist of pages, in that case we will switch to idle. > > + */ > > As mentioned, I am not familiar with virtio_balloon and the overall design. > So, some of this does not make sense to me. > > > +static int > > +hugepage_reporting_cycle(struct page_reporting_dev_info *prdev, > > + struct hstate *h, unsigned int nid, > > + struct scatterlist *sgl, unsigned int *offset) > > +{ > > + struct list_head *list = &h->hugepage_freelists[nid]; > > + unsigned int page_len = PAGE_SIZE << h->order; > > + struct page *page, *next; > > + long budget; > > + int ret = 0, scan_cnt = 0; > > + > > + /* > > + * Perform early check, if free area is empty there is > > + * nothing to process so we can skip this free_list. > > + */ > > + if (list_empty(list)) > > + return ret; > > Do note that not all entries on the hugetlb free lists are free. Reserved > entries are also on the free list. The actual number of free entries is > 'h->free_huge_pages - h->resv_huge_pages'. > Is the intention to process reserved pages as well as free pages? Yes, Reserved pages was treated as 'free pages' > > + > > + spin_lock_irq(&hugetlb_lock); > > + > > + if (huge_page_order(h) > MAX_ORDER) > > + budget = HUGEPAGE_REPORTING_CAPACITY; > > + else > > + budget = HUGEPAGE_REPORTING_CAPACITY * 32; > > + > > + /* loop through free list adding unreported pages to sg list */ > > + list_for_each_entry_safe(page, next, list, lru) { > > + /* We are going to skip over the reported pages. */ > > + if (PageReported(page)) { > > + if (++scan_cnt >= MAX_SCAN_NUM) { > > + ret = scan_cnt; > > + break; > > + } > > + continue; > > + } > > + > > + /* > > + * If we fully consumed our budget then update our > > + * state to indicate that we are requesting additional > > + * processing and exit this list. > > + */ > > + if (budget < 0) { > > + atomic_set(&prdev->state, PAGE_REPORTING_REQUESTED); > > + next = page; > > + break; > > + } > > + > > + /* Attempt to pull page from list and place in scatterlist */ > > + if (*offset) { > > + isolate_free_huge_page(page, h, nid); > > Once a hugetlb page is isolated, it can not be used and applications that > depend on hugetlb pages can start to fail. > I assume that is acceptable/expected behavior. Correct? > On some systems, hugetlb pages are a precious resource and the sysadmin > carefully configures the number needed by applications. Removing a hugetlb > page (even for a very short period of time) could cause serious application > failure. That' true, especially for 1G pages. Any suggestions? Let the hugepage allocator be aware of this situation and retry ? > My apologies if that is a stupid question. I really have no knowledge of > this area. > > Mike Kravetz Thanks for your comments, Mike Liang -- >