On Mon, 2024-02-05 at 13:06 -0800, Haitao Huang wrote: > From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > > Each EPC cgroup will have an LRU structure to track reclaimable EPC pages. > When a cgroup usage reaches its limit, the cgroup needs to reclaim pages > from its LRU or LRUs of its descendants to make room for any new > allocations. > > To prepare for reclamation per cgroup, expose the top level reclamation > function, sgx_reclaim_pages(), in header file for reuse. Add a parameter > to the function to pass in an LRU so cgroups can pass in different > tracking LRUs later. > [...] > Add another parameter for passing in the number of > pages to scan and make the function return the number of pages reclaimed > as a cgroup reclaimer may need to track reclamation progress from its > descendants, change number of pages to scan in subsequent calls. Firstly, sorry for late reply as I was away. >From the changelog, it's understandable you want to make this function return pages that are actually reclaimed, and perhaps it's also OK to pass the number of pages to scan. But this doesn't explain why you need to make @nr_to_scan as a pointer, while you are returning the number of pages that are actually reclaimed? And ... [...] > -/* > - * Take a fixed number of pages from the head of the active page pool and > - * reclaim them to the enclave's private shmem files. Skip the pages, which have > - * been accessed since the last scan. Move those pages to the tail of active > - * page pool so that the pages get scanned in LRU like fashion. > +/** > + * sgx_reclaim_pages() - Reclaim a fixed number of pages from an LRU > + * > + * Take a fixed number of pages from the head of a given LRU and reclaim them to > + * the enclave's private shmem files. Skip the pages, which have been accessed > + * since the last scan. Move those pages to the tail of the list so that the > + * pages get scanned in LRU like fashion. > * > * Batch process a chunk of pages (at the moment 16) in order to degrade amount ... there's no comment to explain such design either (@nr_to_scan as a pointer). Btw, with this change, seems "Take a fixed number of pages ..." and "at the moment 16" are not accurate any more. > * of IPI's and ETRACK's potentially required. sgx_encl_ewb() does degrade a bit > @@ -298,8 +300,13 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, > * + EWB) but not sufficiently. Reclaiming one page at a time would also be > * problematic as it would increase the lock contention too much, which would > * halt forward progress. > + * > + * @lru: The LRU from which pages are reclaimed. > + * @nr_to_scan: Pointer to the target number of pages to scan, must be less than > + * SGX_NR_TO_SCAN. > + * Return: Number of pages reclaimed. > */ > -static void sgx_reclaim_pages(void) > +unsigned int sgx_reclaim_pages(struct sgx_epc_lru_list *lru, unsigned int *nr_to_scan) > { > struct sgx_epc_page *chunk[SGX_NR_TO_SCAN]; > struct sgx_backing backing[SGX_NR_TO_SCAN]; > @@ -310,10 +317,10 @@ static void sgx_reclaim_pages(void) > int ret; > int i; > > - spin_lock(&sgx_global_lru.lock); > - for (i = 0; i < SGX_NR_TO_SCAN; i++) { > - epc_page = list_first_entry_or_null(&sgx_global_lru.reclaimable, > - struct sgx_epc_page, list); > + spin_lock(&lru->lock); > + > + for (; *nr_to_scan > 0; --(*nr_to_scan)) { > + epc_page = list_first_entry_or_null(&lru->reclaimable, struct sgx_epc_page, list); > if (!epc_page) > break; > > @@ -328,7 +335,8 @@ static void sgx_reclaim_pages(void) > */ > epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; > } > - spin_unlock(&sgx_global_lru.lock); > + > + spin_unlock(&lru->lock); > > for (i = 0; i < cnt; i++) { > epc_page = chunk[i]; > @@ -351,9 +359,9 @@ static void sgx_reclaim_pages(void) > continue; > > skip: > - spin_lock(&sgx_global_lru.lock); > - list_add_tail(&epc_page->list, &sgx_global_lru.reclaimable); > - spin_unlock(&sgx_global_lru.lock); > + spin_lock(&lru->lock); > + list_add_tail(&epc_page->list, &lru->reclaimable); > + spin_unlock(&lru->lock); > > kref_put(&encl_page->encl->refcount, sgx_encl_release); > > @@ -366,6 +374,7 @@ static void sgx_reclaim_pages(void) > sgx_reclaimer_block(epc_page); > } > > + ret = 0; > for (i = 0; i < cnt; i++) { > epc_page = chunk[i]; > if (!epc_page) > @@ -378,7 +387,10 @@ static void sgx_reclaim_pages(void) > epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; > > sgx_free_epc_page(epc_page); > + ret++; > } > + > + return (unsigned int)ret; > } > Here basically the @nr_to_scan is reduced by the number of pages that are isolated for isolation, but these pages may not be actually reclaimed, e.g., due to aging. Could you clarify the reason of such choice in this patch, preferable using a comment (and/or in changelog if better)? In v8's reply you mentioned this is due to "the uncertainty of how long it takes to reclaim pages" and some other reasons, but I am not sure whether that justify. And AFAICT it also depends on how this function is called. Please also see my reply to your next patch (where it is called). That being said, does it make more sense if you can just merge this patch to your next patch for better review?