Re: [PATCH v13 10/13] x86/sgx: Add sgx_einit() for initializing enclaves

Sean Christopherson <sean.j.christopherson@xxxxxxxxx> · Tue, 4 Sep 2018 07:54:51 -0700

On Mon, Sep 03, 2018 at 04:45:14PM -0700, Huang, Kai wrote:
> > -----Original Message-----
> > From: linux-sgx-owner@xxxxxxxxxxxxxxx [mailto:linux-sgx-
> > owner@xxxxxxxxxxxxxxx] On Behalf Of Jarkko Sakkinen
> > Sent: Tuesday, September 4, 2018 7:19 AM
> > To: Christopherson, Sean J <sean.j.christopherson@xxxxxxxxx>
> > Cc: Huang, Kai <kai.huang@xxxxxxxxx>; platform-driver-x86@xxxxxxxxxxxxxxx;
> > x86@xxxxxxxxxx; nhorman@xxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> > tglx@xxxxxxxxxxxxx; suresh.b.siddha@xxxxxxxxx; Ayoun, Serge
> > <serge.ayoun@xxxxxxxxx>; hpa@xxxxxxxxx; npmccallum@xxxxxxxxxx;
> > mingo@xxxxxxxxxx; linux-sgx@xxxxxxxxxxxxxxx; Hansen, Dave
> > <dave.hansen@xxxxxxxxx>
> > Subject: Re: [PATCH v13 10/13] x86/sgx: Add sgx_einit() for initializing enclaves
> > 
> > On Fri, Aug 31, 2018 at 11:15:09AM -0700, Sean Christopherson wrote:
> > > On Fri, Aug 31, 2018 at 03:17:03PM +0300, Jarkko Sakkinen wrote:
> > > > On Wed, Aug 29, 2018 at 07:33:54AM +0000, Huang, Kai wrote:
> > > > > [snip..]
> > > > >
> > > > > > > >
> > > > > > > > @@ -38,6 +39,18 @@ static LIST_HEAD(sgx_active_page_list);
> > > > > > > > static DEFINE_SPINLOCK(sgx_active_page_list_lock);
> > > > > > > >  static struct task_struct *ksgxswapd_tsk;  static
> > > > > > > > DECLARE_WAIT_QUEUE_HEAD(ksgxswapd_waitq);
> > > > > > > > +static struct notifier_block sgx_pm_notifier; static u64
> > > > > > > > +sgx_pm_cnt;
> > > > > > > > +
> > > > > > > > +/* The cache for the last known values of
> > > > > > > > +IA32_SGXLEPUBKEYHASHx MSRs
> > > > > > > > for each
> > > > > > > > + * CPU. The entries are initialized when they are first
> > > > > > > > + used by
> > > > > > > > sgx_einit().
> > > > > > > > + */
> > > > > > > > +struct sgx_lepubkeyhash {
> > > > > > > > +	u64 msrs[4];
> > > > > > > > +	u64 pm_cnt;
> > > > > > >
> > > > > > > May I ask why do we need pm_cnt here? In fact why do we need
> > > > > > > suspend staff (namely, sgx_pm_cnt above, and related code in
> > > > > > > this patch) here in this patch? From the patch commit message
> > > > > > > I don't see why we need PM staff here. Please give comment why
> > > > > > > you need PM staff, or you may consider to split the PM staff to another
> > patch.
> > > > > >
> > > > > > Refining the commit message probably makes more sense because
> > > > > > without PM code sgx_einit() would be broken. The MSRs have been reset
> > after waking up.
> > > > > >
> > > > > > Some kind of counter is required to keep track of the power
> > > > > > cycle. When going to sleep the sgx_pm_cnt is increased.
> > > > > > sgx_einit() compares the current value of the global count to
> > > > > > the value in the cache entry to see whether we are in a new power cycle.
> > > > >
> > > > > You mean reset to Intel default? I think we can also just reset
> > > > > the cached MSR values on each power cycle, which would be simpler,
> > IMHO?
> > > >
> > > > I don't really see that much difference in the complexity.
> > >
> > > Tracking the validity of the cache means we're hosed if we miss any
> > > condition that causes the MSRs to be reset.  I think we're better off
> > > assuming the cache can be stale at any time, i.e. don't track power
> > > cyles and instead handle EINIT failure due to INVALID_TOKEN by writing
> > > the cache+MSRs with the desired hash and retrying EINIT.  EINIT is
> > > interruptible and its latency is extremely variable in any case, e.g.
> > > tens of thousands of cycles, so this rarely-hit "slow path" probably
> > > wouldn't affect the worst case latency of EINIT.
> > 
> > Sounds a good refiniment. Pretty good solution to heal from host sleep on the
> > guest VM and then there is no need for driver changes.
> 
> To me either way should be OK, keeping MSR cache or retrying EINIT, since EINIT should not be in performance critical path I think.
> 
> But INVALID_TOKEN is not only returned when MSRs are mismatched, so do you plan to check to rule out other cases that cause INVALID_TOKEN before retrying EINIT, or unconditionally  retry EINIT? And we should only retry once?

I don't see any value in trying to rule out specific causes of
INVALID_TOKEN, but we should only retry EINIT if ret==INVALID_TOKEN
and RDMSR(HASH0) != sgx_lepubkeyhash[0].  Only the first MSR needs to
be checked for validity as they're a package deal, i.e. they'll all be
valid or all be reset.  There shouldn't be a limit on retry attempts,
e.g. the MSRs could theoretically be reset between WRMSR and EINIT.

> 
> Thanks,
> -Kai
> > 
> > /Jarkko