> > > From this perspective, I think the current implementation is > "well-defined": EPC cgroup limits for VMs are only enforced at VM launch > time, not runtime. In practice, SGX VM can be launched only with fixed > EPC size and all those EPCs are fully committed to the VM once launched. > Because of that, I imagine people are using VMs to primarily partition the > physical EPCs, i.e, the static size itself is the 'limit' for the workload > of a single VM and not expecting EPCs taken away at runtime. > > So killing does not really add much value for the existing usages IIUC. It's not about adding value to the existing usages, it's about fixing the issue when we lower the EPC limit to a point that is less than total virtual EPC size. It's a design issue, or simply a bug in the current implementation we need to fix. > > That said, I don't anticipate adding the enforcement of killing VMs at > runtime would break such usages as admin/user can simply choose to set the > limit equal to the static size to launch the VM and forget about it. > > Given that, I'll propose an add-on patch to this series as RFC and have > some feedback from community before we decide if that needs be included in > first version or we can skip it until we have EPC reclaiming for VMs. I don't understand what is the "add-on" patch you are talking about. I mentioned the "typical deployment thing" is that can help us understand which algorithm is better way to select the victim. But no matter what we choose, we still need to fix the bug mentioned above here. I really think you should just go this simple way: When you want to take EPC back from VM, kill the VM. Another bad thing about "just removing EPC pages from VM" is the enclaves in the VM would suffer "sudden lose of EPC", or even worse, suffer it at a high frequency. Although we depend on that for supporting SGX VM live migration, but that needs to avoided if possible.