On Mon Feb 26, 2024 at 11:56 PM EET, Dave Hansen wrote: > On 2/26/24 13:48, Haitao Huang wrote: > > In case of overcomitting, i.e., sum of limits greater than the EPC > > capacity, if one group has a fault, and its usage is not above its own > > limit (try_charge() passes), yet total usage of the system has exceeded > > the capacity, whether we do global reclaim or just reclaim pages in the > > current faulting group. > > I don't see _any_ reason to limit reclaim to the current faulting cgroup. > > >> Last, what is the simplest (least amount of code) thing that the SGX > >> cgroup controller could implement here? > > > > I still think the current approach of doing global reclaim is reasonable > > and simple: try_charge() checks cgroup limit and reclaim within the > > group if needed, then do EPC page allocation, reclaim globally if > > allocation fails due to global usage reaches the capacity. > > > > I'm not sure how not doing global reclaiming in this case would bring > > any benefit. > I tend to agree. > > Kai, I think your examples sound a little bit contrived. Have actual > users expressed a strong intent for doing anything with this series > other than limiting bad actors from eating all the EPC? I'd consider this from the viewpoint is there anything in the user space visible portion of the patch set that would limit tuning the performance later on, if required let's say by a workload that acts sub-optimally. If not, then most of performance related issues can be only identified by actual use of the code. BR, Jarkko