Hi Dave,
On Wed, 03 Jan 2024 10:37:35 -0600, Dave Hansen <dave.hansen@xxxxxxxxx>
wrote:
On 12/18/23 13:24, Haitao Huang wrote:> @Dave and @Michal, Your
thoughts? Or could you confirm we should not
do reclaim per cgroup at all?
What's the benefit of doing reclaim per cgroup? Is that worth the extra
complexity?
Without reclaiming per cgroup, then we have to always set the limit to
enclave's peak usage. This may not be efficient utilization as in many
cases each enclave can perform fine with EPC limit set less than peak.
Basically each group can not give up some pages for greater good without
dying :-)
Also with enclaves enabled with EDMM, the peak usage is not static so hard
to determine upfront. Hence it might be an operation/deployment
inconvenience.
In case of over-committing (sum of limits > total capacity), one cgroup at
peak usage may require swapping pages out in a different cgroup if system
is overloaded at that time.
The key question here is whether we want the SGX VM to be complex and
more like the real VM or simple when a cgroup hits its limit. Right?
Although it's fair to say the majority of complexity of this series is in
support for reclaiming per cgroup, I think it's manageable and much less
than real VM after we removed the enclave killing parts: the only extra
effort is to track pages in separate list and reclaim them in separately
as opposed to track in on global list and reclaim together. The main
reclaiming loop code is still pretty much the same as before.
If stopping at patch 5 and having less code is even remotely an option,
why not do _that_?
I hope I described limitations clear enough above.
If those are OK with users and also make it acceptable for merge quickly,
I'm happy to do that :-)
Thanks
Haitao