Re: [PATCH v6 09/12] x86/sgx: Restructure top-level EPC reclaim function

"Haitao Huang" <haitao.huang@xxxxxxxxxxxxxxx> · Thu, 04 Jan 2024 13:11:15 -0600

Hi Dave,

On Wed, 03 Jan 2024 10:37:35 -0600, Dave Hansen <dave.hansen@xxxxxxxxx>  
wrote:

On 12/18/23 13:24, Haitao Huang wrote:> @Dave and @Michal, Your
thoughts? Or could you confirm we should not
do reclaim per cgroup at all?
What's the benefit of doing reclaim per cgroup?  Is that worth the extra
complexity?

Without reclaiming per cgroup, then we have to always set the limit to  
enclave's peak usage. This may not be efficient utilization as in many  
cases each enclave can perform fine with EPC limit set less than peak.  
Basically each group can not give up some pages for greater good without  
dying :-)

Also with enclaves enabled with EDMM, the peak usage is not static so hard  
to determine upfront. Hence it might be an operation/deployment  
inconvenience.

In case of over-committing (sum of limits > total capacity), one cgroup at  
peak usage may require swapping pages out in a different cgroup if system  
is overloaded at that time.

The key question here is whether we want the SGX VM to be complex and
more like the real VM or simple when a cgroup hits its limit.  Right?

Although it's fair to say the majority of complexity of this series is in  
support for reclaiming per cgroup, I think it's manageable and much less  
than real VM after we removed the enclave killing parts: the only extra  
effort is to track pages in separate list and reclaim them in separately  
as opposed to track in on global list and reclaim together. The main  
reclaiming loop code is still pretty much the same as before.

If stopping at patch 5 and having less code is even remotely an option,
why not do _that_?

I hope I described limitations clear enough above.
If those are OK with users and also make it acceptable for merge quickly,  
I'm happy to do that :-)

Thanks
Haitao