Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

"Jarkko Sakkinen" <jarkko@xxxxxxxxxx> · Mon, 12 Feb 2024 21:55:46 +0200

On Mon Feb 5, 2024 at 11:06 PM EET, Haitao Huang wrote:
> From: Kristen Carlson Accardi <kristen@xxxxxxxxxxxxxxx>
>
> When the EPC usage of a cgroup is near its limit, the cgroup needs to
> reclaim pages used in the same cgroup to make room for new allocations.
> This is analogous to the behavior that the global reclaimer is triggered
> when the global usage is close to total available EPC.
>
> Add a Boolean parameter for sgx_epc_cgroup_try_charge() to indicate
> whether synchronous reclaim is allowed or not. And trigger the
> synchronous/asynchronous reclamation flow accordingly.
>
> Note at this point, all reclaimable EPC pages are still tracked in the
> global LRU and per-cgroup LRUs are empty. So no per-cgroup reclamation
> is activated yet.
>
> Co-developed-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> Signed-off-by: Kristen Carlson Accardi <kristen@xxxxxxxxxxxxxxx>
> Co-developed-by: Haitao Huang <haitao.huang@xxxxxxxxxxxxxxx>
> Signed-off-by: Haitao Huang <haitao.huang@xxxxxxxxxxxxxxx>
> ---
> V7:
> - Split this out from the big patch, #10 in V6. (Dave, Kai)
> ---
>  arch/x86/kernel/cpu/sgx/epc_cgroup.c | 26 ++++++++++++++++++++++++--
>  arch/x86/kernel/cpu/sgx/epc_cgroup.h |  4 ++--
>  arch/x86/kernel/cpu/sgx/main.c       |  2 +-
>  3 files changed, 27 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
> index d399fda2b55e..abf74fdb12b4 100644
> --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c
> +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
> @@ -184,13 +184,35 @@ static void sgx_epc_cgroup_reclaim_work_func(struct work_struct *work)
>  /**
>   * sgx_epc_cgroup_try_charge() - try to charge cgroup for a single EPC page
>   * @epc_cg:	The EPC cgroup to be charged for the page.
> + * @reclaim:	Whether or not synchronous reclaim is allowed
>   * Return:
>   * * %0 - If successfully charged.
>   * * -errno - for failures.
>   */
> -int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg)
> +int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, bool reclaim)
>  {
> -	return misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, PAGE_SIZE);
> +	for (;;) {
> +		if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg,
> +					PAGE_SIZE))
> +			break;
> +
> +		if (sgx_epc_cgroup_lru_empty(epc_cg->cg))
> +			return -ENOMEM;
> + +		if (signal_pending(current))
> +			return -ERESTARTSYS;
> +
> +		if (!reclaim) {
> +			queue_work(sgx_epc_cg_wq, &epc_cg->reclaim_work);
> +			return -EBUSY;
> +		}
> +
> +		if (!sgx_epc_cgroup_reclaim_pages(epc_cg->cg, false))
> +			/* All pages were too young to reclaim, try again a little later */
> +			schedule();

This will be total pain to backtrack after a while when something
needs to be changed so there definitely should be inline comments
addressing each branch condition.

I'd rethink this as:

1. Create static __sgx_epc_cgroup_try_charge() for addressing single
   iteration with the new "reclaim" parameter.
2. Add a new sgx_epc_group_try_charge_reclaim() function.

There's a bit of redundancy with sgx_epc_cgroup_try_charge() and
sgx_epc_cgroup_try_charge_reclaim() because both have almost the
same loop calling internal __sgx_epc_cgroup_try_charge() with
different parameters. That is totally acceptable.

Please also add my suggested-by.

BR, Jarkko

BR, Jarkko