RE: [RFC v3 14/25] intel_iommu: add virtual command capability support

"Liu, Yi L" <yi.l.liu@xxxxxxxxx> · Thu, 13 Feb 2020 02:40:45 +0000

> From: Peter Xu <peterx@xxxxxxxxxx>
> Sent: Wednesday, February 12, 2020 5:57 AM
> To: Liu, Yi L <yi.l.liu@xxxxxxxxx>
> Subject: Re: [RFC v3 14/25] intel_iommu: add virtual command capability support
> 
> On Wed, Jan 29, 2020 at 04:16:45AM -0800, Liu, Yi L wrote:
> > +/*
> > + * The basic idea is to let hypervisor to set a range for available
> > + * PASIDs for VMs. One of the reasons is PASID #0 is reserved by
> > + * RID_PASID usage. We have no idea how many reserved PASIDs in future,
> > + * so here just an evaluated value. Honestly, set it as "1" is enough
> > + * at current stage.
> > + */
> > +#define VTD_MIN_HPASID              1
> > +#define VTD_MAX_HPASID              0xFFFFF
> 
> One more question: I see that PASID is defined as 20bits long.  It's
> fine.  However I start to get confused on how the Scalable Mode PASID
> Directory could service that much of PASID entries.
> 
> I'm looking at spec 3.4.3, Figure 3-8.
> 
> Firstly, we only have two levels for a PASID table.  The context entry
> of a device stores a pointer to the "Scalable Mode PASID Directory"
> page. I see that there're 2^14 entries in "Scalable Mode PASID
> Directory" page, each is a "Scalable Mode PASID Table".
> However... how do we fit in the 4K page if each entry is a pointer of
> x86_64 (8 bytes) while there're 2^14 entries?  A simple math gives me
> 4K/8 = 512, which means the "Scalable Mode PASID Directory" page can
> only have 512 entries, then how the 2^14 come from?  Hmm??

I checked with Kevin. The spec doesn't say the dir table is 4K. It says 4K
only for pasid table. Also, if you look at 9.4, scalabe-mode context entry
includes a PDTS field to specify the actual size of the directory table.

> Apart of this: also I just noticed (when reading the latter part of
> the series) that the time that a pasid table walk can consume will
> depend on this value too.  I'd suggest to make this as small as we
> can, as long as it satisfies the usage.  We can even bump it in the
> future.

I see. This looks to be an optimization. right? Instead of modify the
value of this macro,  I think we can do this optimization by tracking
the allocated PASIDs in QEMU. Thus, the pasid table walk  would be more
efficient and also no dependency on the VTD_MAX_HPASID. Does it make
sense to you? :-)

Regards,
Yi Liu