Re: [PATCH v9 08/15] x86/sgx: Implement EPC reclamation flows for cgroup

"Huang, Kai" <kai.huang@xxxxxxxxx> · Wed, 21 Feb 2024 11:00:27 +0000

On Wed, 2024-02-21 at 00:44 -0600, Haitao Huang wrote:
> [...]
> > 
> > Here the @nr_to_scan is reduced by the number of pages that are  
> > isolated, but
> > not actually reclaimed (which is reflected by @cnt).
> > 
> > IIUC, looks you want to make this function do "each cycle" as what you  
> > mentioned
> > in the v8 [1]:
> > 
> > 	I tested with that approach and found we can only target number of
> > pages
> > 	attempted to reclaim not pages actually reclaimed due to the
> > uncertainty
> > 	of how long it takes to reclaim pages. Besides targeting number of
> > 	scanned pages for each cycle is also what the ksgxd does.
> > 
> > 	If we target actual number of pages, sometimes it just takes too long.
> > I
> > 	saw more timeouts with the default time limit when running parallel
> > 	selftests.
> > 
> > I am not sure what does "sometimes it just takes too long" mean, but  
> > what I am
> > thinking is you are trying to do some perfect but yet complicated code  
> > here.
> 
> I think what I observed was that the try_charge() would block too long  
> before getting chance of schedule() to yield, causing more timeouts than  
> necessary.
> I'll do some re-test to be sure.

Looks this is a valid information that can be used to justify whatever you are
implementing in the EPC cgroup reclaiming function(s).

> 
> > 
> > For instance, I don't think selftest reflect the real workload, and I  
> > believe
> > adjusting the limit of a given EPC cgroup shouldn't be a frequent  
> > operation,
> > thus it is acceptable to use some easy-maintain code but less perfect  
> > code.
> > 
> > Here I still think having @nr_to_scan as a pointer is over-complicated.   
> > For
> > example, we can still let sgx_reclaim_pages() to always scan  
> > SGX_NR_TO_SCAN
> > pages, but give up when there's enough pages reclaimed or when the EPC  
> > cgroup
> > and its descendants have been looped:
> > 
> > unsigned int sgx_epc_cgroup_reclaim_pages(struct misc_cg *root)
> > {
> > 	unsigned int cnt = 0;
> > 	...
> > 
> > 	css_for_each_descendant_pre(pos, css_root) {
> > 		...
> > 		epc_cg = sgx_epc_cgroup_from_misc_cg(css_misc(pos));
> > 		cnt += sgx_reclaim_pages(&epc_cg->lru);
> > 
> > 		if (cnt >= SGX_NR_TO_SCAN)
> > 			break;
> > 	}
> > 
> > 	...
> > 	return cnt;
> > }
> > 
> > Yeah it may reclaim more than SGX_NR_TO_SCAN when the loop actually  
> > reaches any
> > descendants, but that should be rare and we don't care that much, do we?
> > 
> I assume you meant @cnt here to be number of pages actually reclaimed.  

Yes.

> This could cause  sgx_epc_cgroup_reclaim_pages() block too long as @cnt  
> may always be zero (all pages are too young) and you have to loop all  
> descendants.

I am not sure whether this is a valid point.

For example, your change in patch 10 "x86/sgx: Add EPC reclamation in cgroup
try_charge()" already loops all descendants in below code:

+		if (sgx_epc_cgroup_lru_empty(epc_cg->cg))
+			return -ENOMEM;

Anyway, I can see all these can be justification to your design/implementation.
My point is please put these justification in changelog/comments so that we can
actually understand.

Makes sense?