On Mon, 06 May 2024 19:10:42 -0500, Huang, Kai <kai.huang@xxxxxxxxx> wrote:
On 1/05/2024 7:51 am, Haitao Huang wrote:
static void sgx_reclaim_pages_global(struct mm_struct *charge_mm)
{
- sgx_reclaim_pages(&sgx_global_lru, charge_mm);
+ if (IS_ENABLED(CONFIG_CGROUP_MISC))
+ sgx_cgroup_reclaim_pages(misc_cg_root(), charge_mm);
+ else
+ sgx_reclaim_pages(&sgx_global_lru, charge_mm);
}
I think we have a problem here when we do global reclaim starting from
the ROOT cgroup:
This function will mostly just only try to reclaim from the ROOT cgroup,
but won't reclaim from the descendants.
The reason is the sgx_cgroup_reclaim_pages() will simply return after
"scanning" SGX_NR_TO_SCAN (16) pages w/o going to the descendants, and
the "scanning" here simply means "removing the EPC page from the
cgroup's LRU list".
So as long as the ROOT cgroup LRU contains more than SGX_NR_TO_SCAN (16)
pages, effectively sgx_cgroup_reclaim_pages() will just scan and return
w/o going into the descendants. Having 16 EPC pages should be a "almost
always true" case I suppose.
When the sgx_reclaim_pages_global() is called again, we will start from
the ROOT again.
That means the this doesn't truly reclaim "from global" at all.
IMHO the behaviour of sgx_cgroup_reclaim_pages() is OK for per-cgroup
reclaim because I think in this case our intention is we should try best
to reclaim from the cgroup, i.e., whether we can reclaim from
descendants doesn't matter.
But for global reclaim this doesn't work.
Am I missing anything?
Good catch. This is indeed a problem if pages in a higher level cgroup are
always busy (being 'young').The reclamation loop starting from this group
may be stuck in only shifting the pages from front to tail in this group
and never tries to scan & reclaim pages in its descendants.
Though this may not happen often, I think it does require a fix. Will do
it in v14 :-)
Thanks
Haitao