Re: [PATCH v2] x86/sgx: Fix deadlock and race conditions between fork() and EPC reclaim

Jarkko Sakkinen <jarkko.sakkinen@xxxxxxxxxxxxxxx> · Fri, 10 Apr 2020 16:22:45 +0300



On Thu, Apr 09, 2020 at 12:13:53PM -0700, Sean Christopherson wrote:
> On Mon, Apr 06, 2020 at 08:15:56PM +0300, Jarkko Sakkinen wrote:
> > On Mon, Apr 06, 2020 at 08:10:29PM +0300, Jarkko Sakkinen wrote:
> > > On Mon, Apr 06, 2020 at 07:36:38AM -0700, Sean Christopherson wrote:
> > > > On Sat, Apr 04, 2020 at 04:12:02AM +0300, Jarkko Sakkinen wrote:
> > > > > On Fri, Apr 03, 2020 at 04:42:39PM -0700, Sean Christopherson wrote:
> > > > > > On Fri, Apr 03, 2020 at 12:35:50PM +0300, Jarkko Sakkinen wrote:
> > > > > > > From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> > > > > > > @@ -221,12 +224,16 @@ int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm)
> > > > > > >  		return ret;
> > > > > > >  	}
> > > > > > >  
> > > > > > > +	/*
> > > > > > > +	 * The page reclaimer uses list version for synchronization instead of
> > > > > > > +	 * synchronize_scru() because otherwise we could conflict with
> > > > > > > +	 * dup_mmap().
> > > > > > > +	 */
> > > > > > >  	spin_lock(&encl->mm_lock);
> > > > > > >  	list_add_rcu(&encl_mm->list, &encl->mm_list);
> > > > > > 
> > > > > > You dropped the smp_wmb().
> > > > > 
> > > > > As I said to you in my review x86 pipeline does not reorder writes.
> > > > 
> > > > And as I pointed out in this thread, smp_wmb() is a _compiler_ barrier if
> > > > and only if CONFIG_SMP=y.  The compiler can reorder list_add_rcu() and
> > > > mm_list_version++ because from it's perspective there is no dependency
> > > > between the two.  And that's entirely true except for the SMP case where
> > > > the consumer of mm_list_version is relying on the list to be updated before
> > > > the version changes.
> > > 
> > > I see.
> > > 
> > > So why not change the variable volatile given that x86 is the only
> > > arch that this code gets used?
> > 
> > Please note that I'm fully aware of
> > 
> > https://www.kernel.org/doc/html/latest/process/volatile-considered-harmful.html
> > 
> > Just wondering. Anyway, I'll add smp_wmb() back since it is safe play
> > in terms of acceptance.
> 
> Because volatile is overkill and too heavy-handed for what is needed here.
> E.g. if SMP=n, then there are no barriers needed whatsover, as a CPU can't
> race with itself.
> 
> Even for SMP=y, volatile is too much as it prevents optimization that would
> otherwise be allowed.  For the smp_wmb() it doesn't matter much, if at all,
> since the list and version are updated inside of a critical section, i.e.
> barriered on both sides so the compiler can't do much optimization anyways.
> 
> But for the smp_rmb() case, the compiler is free to cache the version in a
> register early on in the function, it just needs to make sure that the
> access is done before starting the list walk.  If encl->mm_list_version
> were volatile, the compiler would not be allowed to do such an optimization
> as it'd be required to access memory exactly when it's referenced in code.
> 
> This is easily visible in the compiled code, as encl->mm_list_version is
> only read from memory once per iteration (in the unlikely case that there
> are multiple iterations).  It takes the do-while loop, which appears to
> read encl->mm_list_version twice per iteration:
> 
>   do {
>         mm_list_version = encl->mm_list_version;
> 
>         <walk list>
> 
>   } while (unlikely(encl->mm_list_version != mm_list_version));
> 
> 
> And turns it into an optimized loop that loads encl->mm_list_version the
> minimum number of times.  If encl->mm_list_version list were volatile, the
> compiler would not be allowed to cache it in %rax.
> 
>         mov    0x58(%r12),%rax          // Load from encl->mm_list_version
>         lea    0x40(%r12),%rbx          // Interleaved to optimize ALU vs LD
>         and    $0xfffffffffffff000,%rbp // Interleaved to optimize ALU vs LD
>         mov    %rax,0x8(%rsp)           // mm_list_version = encl->mm_list_version
> 
> 
>   walk_mm_list:
>         <blah blah blah>
> 
>         mov    0x58(%r12),%rax          // Load from encl->mm_list_version
>         cmp    0x8(%rsp),%rax           // Compare against mm_list_version
>         jne    update_mm_list_version
> 
>         <happy path>
>         ret
> 
>   update_mm_list_version:
>         mov    %rax,0x8(%rsp)           // mm_list_version = encl->mm_list_version
>         jmpq   0xffffffff8102e460 <walk_mm_list>

Thanks and check v4 so that I can squash it :-)

/Jarkko