Re: [PATCH v2] x86/sgx: Fix deadlock and race conditions between fork() and EPC reclaim

Sean Christopherson <sean.j.christopherson@xxxxxxxxx> · Thu, 9 Apr 2020 12:13:53 -0700

On Mon, Apr 06, 2020 at 08:15:56PM +0300, Jarkko Sakkinen wrote:
> On Mon, Apr 06, 2020 at 08:10:29PM +0300, Jarkko Sakkinen wrote:
> > On Mon, Apr 06, 2020 at 07:36:38AM -0700, Sean Christopherson wrote:
> > > On Sat, Apr 04, 2020 at 04:12:02AM +0300, Jarkko Sakkinen wrote:
> > > > On Fri, Apr 03, 2020 at 04:42:39PM -0700, Sean Christopherson wrote:
> > > > > On Fri, Apr 03, 2020 at 12:35:50PM +0300, Jarkko Sakkinen wrote:
> > > > > > From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> > > > > > @@ -221,12 +224,16 @@ int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm)
> > > > > >  		return ret;
> > > > > >  	}
> > > > > >  
> > > > > > +	/*
> > > > > > +	 * The page reclaimer uses list version for synchronization instead of
> > > > > > +	 * synchronize_scru() because otherwise we could conflict with
> > > > > > +	 * dup_mmap().
> > > > > > +	 */
> > > > > >  	spin_lock(&encl->mm_lock);
> > > > > >  	list_add_rcu(&encl_mm->list, &encl->mm_list);
> > > > > 
> > > > > You dropped the smp_wmb().
> > > > 
> > > > As I said to you in my review x86 pipeline does not reorder writes.
> > > 
> > > And as I pointed out in this thread, smp_wmb() is a _compiler_ barrier if
> > > and only if CONFIG_SMP=y.  The compiler can reorder list_add_rcu() and
> > > mm_list_version++ because from it's perspective there is no dependency
> > > between the two.  And that's entirely true except for the SMP case where
> > > the consumer of mm_list_version is relying on the list to be updated before
> > > the version changes.
> > 
> > I see.
> > 
> > So why not change the variable volatile given that x86 is the only
> > arch that this code gets used?
> 
> Please note that I'm fully aware of
> 
> https://www.kernel.org/doc/html/latest/process/volatile-considered-harmful.html
> 
> Just wondering. Anyway, I'll add smp_wmb() back since it is safe play
> in terms of acceptance.

Because volatile is overkill and too heavy-handed for what is needed here.
E.g. if SMP=n, then there are no barriers needed whatsover, as a CPU can't
race with itself.

Even for SMP=y, volatile is too much as it prevents optimization that would
otherwise be allowed.  For the smp_wmb() it doesn't matter much, if at all,
since the list and version are updated inside of a critical section, i.e.
barriered on both sides so the compiler can't do much optimization anyways.

But for the smp_rmb() case, the compiler is free to cache the version in a
register early on in the function, it just needs to make sure that the
access is done before starting the list walk.  If encl->mm_list_version
were volatile, the compiler would not be allowed to do such an optimization
as it'd be required to access memory exactly when it's referenced in code.

This is easily visible in the compiled code, as encl->mm_list_version is
only read from memory once per iteration (in the unlikely case that there
are multiple iterations).  It takes the do-while loop, which appears to
read encl->mm_list_version twice per iteration:

  do {
        mm_list_version = encl->mm_list_version;

        <walk list>

  } while (unlikely(encl->mm_list_version != mm_list_version));

And turns it into an optimized loop that loads encl->mm_list_version the
minimum number of times.  If encl->mm_list_version list were volatile, the
compiler would not be allowed to cache it in %rax.

        mov    0x58(%r12),%rax          // Load from encl->mm_list_version
        lea    0x40(%r12),%rbx          // Interleaved to optimize ALU vs LD
        and    $0xfffffffffffff000,%rbp // Interleaved to optimize ALU vs LD
        mov    %rax,0x8(%rsp)           // mm_list_version = encl->mm_list_version

  walk_mm_list:
        <blah blah blah>

        mov    0x58(%r12),%rax          // Load from encl->mm_list_version
        cmp    0x8(%rsp),%rax           // Compare against mm_list_version
        jne    update_mm_list_version

        <happy path>
        ret

  update_mm_list_version:
        mov    %rax,0x8(%rsp)           // mm_list_version = encl->mm_list_version
        jmpq   0xffffffff8102e460 <walk_mm_list>