Re: [PATCH v9 08/15] x86/sgx: Implement EPC reclamation flows for cgroup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 23/02/2024 6:20 am, Haitao Huang wrote:
On Wed, 21 Feb 2024 05:00:27 -0600, Huang, Kai <kai.huang@xxxxxxxxx> wrote:

On Wed, 2024-02-21 at 00:44 -0600, Haitao Huang wrote:
[...]
>
> Here the @nr_to_scan is reduced by the number of pages that are
> isolated, but
> not actually reclaimed (which is reflected by @cnt).
>
> IIUC, looks you want to make this function do "each cycle" as what you
> mentioned
> in the v8 [1]:
>
>     I tested with that approach and found we can only target number of
> pages
>     attempted to reclaim not pages actually reclaimed due to the
> uncertainty
>     of how long it takes to reclaim pages. Besides targeting number of
>     scanned pages for each cycle is also what the ksgxd does.
>
>     If we target actual number of pages, sometimes it just takes too long.
> I
>     saw more timeouts with the default time limit when running parallel
>     selftests.
>
> I am not sure what does "sometimes it just takes too long" mean, but
> what I am
> thinking is you are trying to do some perfect but yet complicated code
> here.

I think what I observed was that the try_charge() would block too long
before getting chance of schedule() to yield, causing more timeouts than
necessary.
I'll do some re-test to be sure.

Looks this is a valid information that can be used to justify whatever you are
implementing in the EPC cgroup reclaiming function(s).

I'll add some comments. Was assuming this is just following the old design as ksgxd. There were some comments at the beginning of sgx_epc_cgrooup_reclaim_page().
         /*
          * Attempting to reclaim only a few pages will often fail and is
         * inefficient, while reclaiming a huge number of pages can result in          * soft lockups due to holding various locks for an extended duration.
          */
         unsigned int nr_to_scan = SGX_NR_TO_SCAN;

I think it can be improved to emphasize we only "attempt" to finish scanning fixed number of pages for reclamation, not enforce number of pages successfully reclaimed.

Not sure need to be this comment, but at somewhere just state you are trying to follow the ksgxd() (the current sgx_reclaim_pages()), but trying to do it "_across_ given cgroup and all the descendants".

That's the reason you made @nr_to_scan as a pointer.

And also some text to explain why to follow ksgxd() -- not wanting to block longer due to loop over descendants etc -- so we can focus on discussing whether such justification is reasonable.





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux