On Thu, Jun 03, 2021 at 09:37:52PM +0000, Sean Christopherson wrote: > On Thu, Jun 03, 2021, Jarkko Sakkinen wrote: > > On Wed, Jun 02, 2021 at 11:36:43AM +0800, Du Cheng wrote: > > > Hi, > > > > > > I like to report a bug on my linux box running the mainline linux of version: > > > commit 8124c8a6b35386f73523d27eacb71b5364a68c4c tag: v5.13-rc4 > > > > > > After it boots on my intel NUC, I encounter this error in the console log, I > > > believe it is triggered by a WARN_ON(): > > > > > > [ 0.628094] sgx: EPC section 0x30200000-0x35f7ffff > > > [ 0.628503] ------------[ cut here ]------------ > > > [ 0.628506] WARNING: CPU: 6 PID: 127 at arch/x86/kernel/cpu/sgx/main.c:428 ksgxd+0x1c8/0x1e0 > > > > > > > > > I have attached my config file with which I compiled the kernel, just in case it is helpful. > > > > > > I am running on ubuntu 21.04 with mainline kernel, and my box is intel NUC: > > > > > > Product Name: NUC10i5FNH > > > SKU Number: BXNUC10i5FNH > > > Product Name: NUC10i5FNB > > > > Is it possible to test with 5.12? > > > > Linux does not support that hardware, except for KVM VM's, which was > > added in 5.13. > > I'm pretty sure that the issue is kthread_stop() being called on ksgxd before > __sgx_sanitize_pages() completes, and that lack of launch control is what is > exposing the bug. > > Prior to adding KVM support, sgx_init() bailed immediately because > X86_FEATURE_SGX was cleared if X86_FEATURE_SGX_LC was unsupported. > > With KVM support, sgx_drv_init() handles the X86_FEATURE_SGX_LC check manually, > so now there's any easy-to-hit case where sgx_init() will spawn ksgxd and _then_ > fails to initialize, which results in sgx_init() stopping ksgxd before it finishes > sanitizing the EPC. > > The bug existed before KVM support, it was just much harder to hit because it > basically required char device registration to fail. > > This should suppress the WARN if ksgxd is stopped early. > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c > index 63d3de02bbcc..bdf31ddfb10d 100644 > --- a/arch/x86/kernel/cpu/sgx/main.c > +++ b/arch/x86/kernel/cpu/sgx/main.c > @@ -425,7 +425,7 @@ static int ksgxd(void *p) > __sgx_sanitize_pages(&sgx_dirty_page_list); > > /* sanity check: */ > - WARN_ON(!list_empty(&sgx_dirty_page_list)); > + WARN_ON(!list_empty(&sgx_dirty_page_list) && !kthread_should_stop()); > > while (!kthread_should_stop()) { > if (try_to_freeze()) > > > If that works, then > > Fixes: e7e0545299d8 ("x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections") > > is probably most appropriate. Since this could happen theoretically in 5.11, I agree that it's the commit. Can you send a proper patch? I can also mangle a patch, if you don't have the bandwidth. What you wrote above goes for a commit message. /Jarkko