On Fri, 2024-04-19 at 13:55 -0500, Haitao Huang wrote: > On Thu, 18 Apr 2024 20:32:14 -0500, Huang, Kai <kai.huang@xxxxxxxxx> wrote: > > > > > > > On 16/04/2024 3:20 pm, Haitao Huang wrote: > > > From: Kristen Carlson Accardi <kristen@xxxxxxxxxxxxxxx> > > > In cases EPC pages need be allocated during a page fault and the cgroup > > > usage is near its limit, an asynchronous reclamation needs be triggered > > > to avoid blocking the page fault handling. > > > Create a workqueue, corresponding work item and function definitions > > > for EPC cgroup to support the asynchronous reclamation. > > > In case the workqueue allocation is failed during init, disable cgroup. > > > > It's fine and reasonable to disable (SGX EPC) cgroup. The problem is > > "exactly what does this mean" isn't quite clear. > > > First, this is really some corner case most people don't care: during > init, kernel can't even allocate a workqueue object. So I don't think we > should write extra code to implement some sophisticated solution. Any > solution we come up with may just not work as the way user want or solve > the real issue due to the fact such allocation failure even happens at > init time. I think for such boot time failure we can either choose directly BUG_ON(), or we try to handle it _nicely_, but not half-way. My experience is adding BUG_ON() should be avoided in general, but it might be acceptable during kernel boot. I will leave it to others. [...] > > > > ..., IIUC you choose a (third) solution that is even one more step back: > > > > It just makes try_charge() always succeed, but EPC pages are still > > managed in the "per-cgroup" list. > > > > But this solution, AFAICT, doesn't work. The reason is when you fail to > > allocate EPC page you will do the global reclaim, but now the global > > list is empty. > > > > Am I missing anything? > > But when cgroups enabled in config, global reclamation starts from root > and reclaim from the whole hierarchy if user may still be able to create. > Just that we don't have async/sync per-cgroup reclaim triggered. OK. I missed this as it is in a later patch. > > > > > So my thinking is, we have two options: > > > > 1) Modify the MISC cgroup core code to allow the kernel to disable one > > particular resource. It shouldn't be hard, e.g., we can add a > > 'disabled' flag to the 'struct misc_res'. > > > > Hmm.. wait, after checking, the MISC cgroup won't show any control files > > if the "capacity" of the resource is 0: > > > > " > > * Miscellaneous resources capacity for the entire machine. 0 capacity > > * means resource is not initialized or not present in the host. > > " > > > > So I really suppose we should go with this route, i.e., by just setting > > the EPC capacity to 0? > > > > Note misc_cg_try_charge() will fail if capacity is 0, but we can make it > > return success by explicitly check whether SGX cgroup is disabled by > > using a helper, e.g., sgx_cgroup_disabled(). > > > > And you always return the root SGX cgroup in sgx_get_current_cg() when > > sgx_cgroup_disabled() is true. > > > > And in sgx_reclaim_pages_global(), you do something like: > > > > static void sgx_reclaim_pages_global(..) > > { > > #ifdef CONFIG_CGROUP_MISC > > if (sgx_cgroup_disabled()) > > sgx_reclaim_pages(&sgx_root_cg.lru); > > else > > sgx_cgroup_reclaim_pages(misc_cg_root()); > > #else > > sgx_reclaim_pages(&sgx_global_list); > > #endif > > } > > > > I am perhaps missing some other spots too but you got the idea. > > > > At last, after typing those, I believe we should have a separate patch > > to handle disable SGX cgroup at initialization time. And you can even > > put this patch _somewhere_ after the patch > > > > "x86/sgx: Implement basic EPC misc cgroup functionality" > > > > and before this patch. > > > > It makes sense to have such patch anyway, because with it we can easily > > to add a kernel command line 'sgx_cgroup=disabled" if the user wants it > > disabled (when someone has such requirement in the future). > > > > I think we can add support for "sgx_cgroup=disabled" in future if indeed > needed. But just for init failure, no? > It's not about the commandline, which we can add in the future when needed. It's about we need to have a way to handle SGX cgroup being disabled at boot time nicely, because we already have a case where we need to do so. Your approach looks half-way to me, and is not future extendible. If we choose to do it, do it right -- that is, we need a way to disable it completely in both kernel and userspace so that userspace won't be able to see it.