Re: [PATCH v4] x86/sgx: Fix the call order of synchronize_srcu() in sgx_release()

Jarkko Sakkinen <jarkko@xxxxxxxxxx> · Thu, 21 Jan 2021 14:55:17 +0200

On Wed, Jan 20, 2021 at 05:19:44PM -0800, Dave Hansen wrote:
> On 1/20/21 4:29 PM, Jarkko Sakkinen wrote:
> > On Wed, Jan 20, 2021 at 09:35:28AM -0800, Sean Christopherson wrote:
> >> On Fri, Jan 15, 2021, jarkko@xxxxxxxxxx wrote:
> >>> From: Jarkko Sakkinen <jarkko@xxxxxxxxxx>
> >>>
> >>> The most trivial example of a race condition can be demonstrated with this
> >>> example where mm_list contains just one entry:
> >>>
> >>> CPU A                   CPU B
> >>> sgx_release()
> >>>                         sgx_mmu_notifier_release()
> >>>                         list_del_rcu()
> >>> sgx_encl_release()
> >>>                         synchronize_srcu()
> >>> cleanup_srcu_struct()
> >>>
> >>> To fix this, call synchronize_srcu() before checking whether mm_list is
> >>> empty in sgx_release().
> >> Why haven't you included the splat that Haitao provided?  That would go a long
> >> way to helping answer Boris' question about exactly what is broken...
> > I've lost the klog.
> 
> Haitao said he thought it was this:
> 
> > void cleanup_srcu_struct(struct srcu_struct *ssp)
> > {
> ...
> >         if (WARN_ON(srcu_readers_active(ssp)))
> >                 return; /* Just leak it! */
> 
> Which would indicate that an 'encl' kref reached 0 while some other
> thread was inside a
> 
>         idx = srcu_read_lock(&encl->srcu);
> 	...
> 	srcu_read_unlock(&encl->srcu, idx);
> 
> critical section.  A quick audit didn't turn up any obvious suspects,
> though.
> 
> If that *is* it, it might be nice to try to catch the culprit at
> srcu_read_{un}lock() time.  If there's ever a 0 refcount at those sites,
> it would be nice to spew a warning:
> 
>         idx = srcu_read_lock(&encl->srcu);
> 	WARN_ON(!atomic_read(&encl->refcount->refcount);
> 	...
> 	WARN_ON(!atomic_read(&encl->refcount->refcount);
> 	srcu_read_unlock(&encl->srcu, idx);
> 
> at each site.

The root cause is fully known already and audited.

An mm_list entry is kept up until the process exits *or* when
VFS close happens, and sgx_release() executes and removes it.

It's entirely possible that MMU notifier callback registered
by a process happens while sgx_release() is executing, which
causes list_del_rcu() to happen, unnoticed by sgx_release().

If that empties the list, cleanup_srcu_struct() is executed
unsynchronized in the middle a unfinished grace period.

/Jarkko