Re: [PATCH] x86/sgx: Avoid softlockup from sgx_vepc_release

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Kai,

On Tue, Aug 15, 2023 at 1:58 AM Huang, Kai <kai.huang@xxxxxxxxx> wrote:
>
> On Mon, 2023-08-14 at 06:37 +0200, Jack Wang wrote:
> > We hit softlocup with following call trace:
> >  kernel: [ 2041.288210] Call Trace:
> > kernel: [ 2041.288213]  <IRQ>
> > kernel: [ 2041.288216]  ? watchdog_timer_fn+0x1b4/0x210
> > kernel: [ 2041.288222]  ? lockup_detector_update_enable+0x50/0x50
> > kernel: [ 2041.288225]  ? __hrtimer_run_queues+0x127/0x280
> > kernel: [ 2041.288230]  ? hrtimer_interrupt+0xfc/0x210
> > kernel: [ 2041.288233]  ? __sysvec_apic_timer_interrupt+0x59/0xd0
> > kernel: [ 2041.288237]  ? sysvec_apic_timer_interrupt+0x6d/0x90
> > kernel: [ 2041.288242]  </IRQ>
> > kernel: [ 2041.288242]  <TASK>
> > kernel: [ 2041.288244]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> > kernel: [ 2041.288250]  xa_erase+0x21/0xb0
> > kernel: [ 2041.288256]  ? sgx_free_epc_page+0x20/0x50
> > kernel: [ 2041.288260]  sgx_vepc_release+0x75/0x220
> > kernel: [ 2041.288263]  __fput+0x89/0x250
> > kernel: [ 2041.288269]  task_work_run+0x59/0x90
> > kernel: [ 2041.288275]  do_exit+0x337/0x9a0
> >
> > Similar like 8795359e35bc ("x86/sgx: Silence softlockup detection when releasing large enclaves")
> > The test system has 64GB of enclave memory, and all assigned to a single
> > VM. Release vepc take longer time and triggers the softlockup warning.
> >
> > Add a cond_resched() to give other tasks a chance to run and placate
> > the softlockup detector.
> >
> > Cc: Jarkko Sakkinen <jarkko@xxxxxxxxxx>
> > Cc: Haitao Huang <haitao.huang@xxxxxxxxxxxxxxx>
> > Reported-by: Yu Zhang <yu.zhang@xxxxxxxxx>
> >
> > Signed-off-by: Jack Wang <jinpu.wang@xxxxxxxxx>
> > ---
> >  arch/x86/kernel/cpu/sgx/virt.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> > index c3e37eaec8ec..01d2795792cc 100644
> > --- a/arch/x86/kernel/cpu/sgx/virt.c
> > +++ b/arch/x86/kernel/cpu/sgx/virt.c
> > @@ -204,6 +204,7 @@ static int sgx_vepc_release(struct inode *inode, struct file *file)
> >                       continue;
> >
> >               xa_erase(&vepc->page_array, index);
> > +             cond_resched();
> >       }
> >
> >       /*
> > @@ -222,6 +223,7 @@ static int sgx_vepc_release(struct inode *inode, struct file *file)
> >                       list_add_tail(&epc_page->list, &secs_pages);
> >
> >               xa_erase(&vepc->page_array, index);
> > +             cond_resched();
> >       }
> >
> >       /*
>
> This adds cond_resched() to the first two loops: releasing non-SECS pages and
> releasing SECS pages which don't have children on remote vepc instance(s).
> There is a third round loop to release SECS pages which have children on remote
> vepc instance(s).  I think it's better to add cond_resched() there too, although
> it's extremely unlikely it could trigger softlockup w/o cond_resched().

I also think the third one are extremely unlikely could cause softlockup.
And our tests do not hit it either.

>
> Also similar to 8795359e35bc ("x86/sgx: Silence softlockup detection when
> releasing large enclaves"), this looks stable-kernel material.  So perhaps add:
>
> Cc: stable@xxxxxxxxxxxxxxx

I've added the fixed tag and cc stable in the v2 patch.

Thx for the feedback and review.




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux