On Mon, Oct 16, 2023, Haitao Huang wrote: > Hi Sean > > On Mon, 16 Oct 2023 16:32:31 -0500, Sean Christopherson <seanjc@xxxxxxxxxx> > wrote: > > > On Mon, Oct 16, 2023, Haitao Huang wrote: > > > From this perspective, I think the current implementation is > > > "well-defined": > > > EPC cgroup limits for VMs are only enforced at VM launch time, not > > > runtime. In practice, SGX VM can be launched only with fixed EPC size > > > and all those EPCs are fully committed to the VM once launched. > > > > Fully committed doesn't mean those numbers are reflected in the cgroup. A > > VM scheduler can easily "commit" EPC to a guest, but allocate EPC on > > demand, i.e. when the guest attempts to actually access a page. > > Preallocating memory isn't free, e.g. it can slow down guest boot, so it's > > entirely reasonable to have virtual EPC be allocated on-demand. Enforcing > > at launch time doesn't work for such setups, because from the cgroup's > > perspective, the VM is using 0 pages of EPC at launch. > > > Maybe I understood the current implementation wrong. From what I see, vEPC > is impossible not fully commit at launch time. The guest would EREMOVE all > pages during initialization resulting #PF and all pages allocated. This > essentially makes "prealloc=off" the same as "prealloc=on". > Unless you are talking about some custom OS or kernel other than upstream > Linux here? Yes, a customer could be running an older kernel, something other than Linux, a custom kernel, an out-of-tree SGX driver, etc. The host should never assume anything about the guest kernel when it comes to correctness (unless the guest kernel is controlled by the host). Doing EREMOVE on all pages is definitely not mandatory, especially if the kernel detects a hypervisor, i.e. knows its running as a guest.