On Wed, Aug 16, 2023 at 10:34 PM Jarkko Sakkinen <jarkko@xxxxxxxxxx> wrote: > > On Tue Aug 15, 2023 at 2:58 AM EEST, Huang, Kai wrote: > > On Mon, 2023-08-14 at 06:37 +0200, Jack Wang wrote: > > > We hit softlocup with following call trace: > > > kernel: [ 2041.288210] Call Trace: > > > kernel: [ 2041.288213] <IRQ> > > > kernel: [ 2041.288216] ? watchdog_timer_fn+0x1b4/0x210 > > > kernel: [ 2041.288222] ? lockup_detector_update_enable+0x50/0x50 > > > kernel: [ 2041.288225] ? __hrtimer_run_queues+0x127/0x280 > > > kernel: [ 2041.288230] ? hrtimer_interrupt+0xfc/0x210 > > > kernel: [ 2041.288233] ? __sysvec_apic_timer_interrupt+0x59/0xd0 > > > kernel: [ 2041.288237] ? sysvec_apic_timer_interrupt+0x6d/0x90 > > > kernel: [ 2041.288242] </IRQ> > > > kernel: [ 2041.288242] <TASK> > > > kernel: [ 2041.288244] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 > > > kernel: [ 2041.288250] xa_erase+0x21/0xb0 > > > kernel: [ 2041.288256] ? sgx_free_epc_page+0x20/0x50 > > > kernel: [ 2041.288260] sgx_vepc_release+0x75/0x220 > > > kernel: [ 2041.288263] __fput+0x89/0x250 > > > kernel: [ 2041.288269] task_work_run+0x59/0x90 > > > kernel: [ 2041.288275] do_exit+0x337/0x9a0 > > > > > > Similar like 8795359e35bc ("x86/sgx: Silence softlockup detection when releasing large enclaves") > > > The test system has 64GB of enclave memory, and all assigned to a single > > > VM. Release vepc take longer time and triggers the softlockup warning. > > > > > > Add a cond_resched() to give other tasks a chance to run and placate > > > the softlockup detector. > > > > > > Cc: Jarkko Sakkinen <jarkko@xxxxxxxxxx> > > > Cc: Haitao Huang <haitao.huang@xxxxxxxxxxxxxxx> > > > Reported-by: Yu Zhang <yu.zhang@xxxxxxxxx> > > > > > > Signed-off-by: Jack Wang <jinpu.wang@xxxxxxxxx> > > > --- > > > arch/x86/kernel/cpu/sgx/virt.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c > > > index c3e37eaec8ec..01d2795792cc 100644 > > > --- a/arch/x86/kernel/cpu/sgx/virt.c > > > +++ b/arch/x86/kernel/cpu/sgx/virt.c > > > @@ -204,6 +204,7 @@ static int sgx_vepc_release(struct inode *inode, struct file *file) > > > continue; > > > > > > xa_erase(&vepc->page_array, index); > > > + cond_resched(); > > > } > > > > > > /* > > > @@ -222,6 +223,7 @@ static int sgx_vepc_release(struct inode *inode, struct file *file) > > > list_add_tail(&epc_page->list, &secs_pages); > > > > > > xa_erase(&vepc->page_array, index); > > > + cond_resched(); > > > } > > > > > > /* > > > > This adds cond_resched() to the first two loops: releasing non-SECS pages and > > releasing SECS pages which don't have children on remote vepc instance(s). > > There is a third round loop to release SECS pages which have children on remote > > vepc instance(s). I think it's better to add cond_resched() there too, although > > it's extremely unlikely it could trigger softlockup w/o cond_resched(). > > > > Also similar to 8795359e35bc ("x86/sgx: Silence softlockup detection when > > releasing large enclaves"), this looks stable-kernel material. So perhaps add: > > > > Cc: stable@xxxxxxxxxxxxxxx > > + would not hurt to have also fixes tag for clarity. > > BR, Jarkko I did it in v2 here, https://lore.kernel.org/linux-sgx/3b93dfa438fb2f42f917393c3752ffc2dec35e52.camel@xxxxxxxxx/T/#t Thx for the review! >