On Tue, 2024-06-18 at 18:23 -0500, Haitao Huang wrote: > On Tue, 18 Jun 2024 18:15:37 -0500, Huang, Kai <kai.huang@xxxxxxxxx> wrote: > > > On Tue, 2024-06-18 at 07:56 -0500, Haitao Huang wrote: > > > On Tue, 18 Jun 2024 06:31:09 -0500, Huang, Kai <kai.huang@xxxxxxxxx> > > > wrote: > > > > > > > > > > > > @@ -921,7 +956,8 @@ static int __init sgx_init(void) > > > > > if (!sgx_page_cache_init()) > > > > > return -ENOMEM; > > > > > > > > > > - if (!sgx_page_reclaimer_init()) { > > > > > + if (!sgx_page_reclaimer_init() || !sgx_cgroup_init()) { > > > > > + misc_cg_set_capacity(MISC_CG_RES_SGX_EPC, 0); > > > > > ret = -ENOMEM; > > > > > goto err_page_cache; > > > > > } > > > > > > > > This code change is wrong due to two reasons: > > > > > > > > 1) If sgx_page_reclaimer_init() was successful, but sgx_cgroup_init() > > > > failed, you actually need to 'goto err_kthread' because the ksgxd() > > > > kernel > > > > thread is already created and is running. > > > > > > > > 2) There are other cases after here that can also result in > > > sgx_init() to > > > > fail completely, e.g., registering sgx_dev_provision mics device. We > > > > need > > > > to reset the capacity to 0 for those cases as well. > > > > > > > > AFAICT, you need something like: > > > > > > > > diff --git a/arch/x86/kernel/cpu/sgx/main.c > > > > b/arch/x86/kernel/cpu/sgx/main.c > > > > index 27892e57c4ef..46f9c26992a7 100644 > > > > --- a/arch/x86/kernel/cpu/sgx/main.c > > > > +++ b/arch/x86/kernel/cpu/sgx/main.c > > > > @@ -930,6 +930,10 @@ static int __init sgx_init(void) > > > > if (ret) > > > > goto err_kthread; > > > > + ret = sgx_cgroup_init(); > > > > + if (ret) > > > > + goto err_provision; > > > > + > > > > /* > > > > * Always try to initialize the native *and* KVM drivers. > > > > * The KVM driver is less picky than the native one and > > > > @@ -941,10 +945,12 @@ static int __init sgx_init(void) > > > > ret = sgx_drv_init(); > > > > if (sgx_vepc_init() && ret) > > > > - goto err_provision; > > > > + goto err_cgroup; > > > > return 0; > > > > +err_cgroup: > > > > + /* SGX EPC cgroup cleanup */ > > > > err_provision: > > > > misc_deregister(&sgx_dev_provision); > > > > @@ -952,6 +958,8 @@ static int __init sgx_init(void) > > > > kthread_stop(ksgxd_tsk); > > > > err_page_cache: > > > > + misc_misc_cg_set_capacity(MISC_CG_RES_SGX_EPC, 0); > > > > + > > > > for (i = 0; i < sgx_nr_epc_sections; i++) { > > > > vfree(sgx_epc_sections[i].pages); > > > > memunmap(sgx_epc_sections[i].virt_addr); > > > > > > > > > > > > I put the sgx_cgroup_init() before sgx_drv_init() and sgx_vepc_init(), > > > > otherwise you will need sgx_drv_cleanup() and sgx_vepc_cleanup() > > > > respectively when sgx_cgroup_init() fails. > > > > > > > > > > Yes, good catch. > > > > > > > This looks a little bit weird too, though: > > > > > > > > Calling misc_misc_cg_set_capacity() to reset capacity to 0 is done at > > > end > > > > of sgx_init() error path, because the "set capacity" part is done in > > > > sgx_epc_cache_init(). > > > > But logically, both "set capacity" and "reset capacity to 0" should be > > > > SGX > > > > EPC cgroup operation, so it's more reasonable to do "set capacity" in > > > > sgx_cgroup_init() and do "reset to 0" in the > > > > > > > > /* SGX EPC cgroup cleanup */ > > > > > > > > as shown above. > > > > > > > > Eventually, you will need to do EPC cgroup cleanup anyway, e.g., to > > > free > > > > the workqueue, so it's odd to have two places to handle EPC cgroup > > > > cleanup. > > > > > > > > I understand the reason "set capacity" part is done in > > > > sgx_page_cache_init() now is because in that function you can easily > > > get > > > > the capacity. But the fact is @sgx_numa_nodes also tracks EPC size > > > for > > > > each node, so you can also get the total EPC size from @sgx_numa_node > > > in > > > > sgx_cgroup_init() and set capacity there. > > > > > > > > In this case, you can put "reset capacity to 0" and "free workqueue" > > > > together as the "SGX EPC cgroup cleanup", which is way more clear > > > IMHO. > > > > > > > Okay, will expose @sgx_numa_nodes to epc_cgroup.c and do the > > > calculations > > > in sgx_cgroup_init(). > > > > > > > Looks you will also need to expose @sgx_numa_mask, which looks overkill. > > > > Other options: > > > > 1) Expose a function to return total EPC pages/size in "sgx.h". > > > > 2) Move out the new 'capacity' variable in this patch as a global > > variable > > and expose it in "sgx.h" (perhaps rename to 'sgx_total_epc_pages/size'). > > > > 3) Make sgx_cgroup_init() to take an argument of total EPC pages/size, > > and > > pass it in sgx_init(). > > For 3) there are also options to get total EPC pages/size: > > > > a) Move out the new 'capacity' variable in this patch as a static. > > > > b) Add a function to calculate total EPC pages/size from sgx_numa_nodes. > > > > Hmm.. I think we can just use option 2)? > > > > > I was about doing this in sgx_cgroup_init(): > for (i = 0; i < num_possible_nodes(); i++) > capacity += sgx_numa_nodes[i].size; > > any concern using num_possible_nodes()? > > I think case handled in sgx_page_cache_init() for a node with no epc (or > mask). Only requirement is sgx_cgroup_init() called after > sgx_page_cache_init(). > All other code uses sgx_numa_mask to tell whether a node has EPC. It would be great if we use this way consistently in all SGX code.