RE: [PATCH v1 12/22] intel_iommu: add PASID cache management infrastructure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: Peter Xu <peterx@xxxxxxxxxx>
> Sent: Thursday, March 26, 2020 11:54 PM
> To: Liu, Yi L <yi.l.liu@xxxxxxxxx>
> Subject: Re: [PATCH v1 12/22] intel_iommu: add PASID cache management
> infrastructure
> 
> On Thu, Mar 26, 2020 at 01:57:10PM +0000, Liu, Yi L wrote:
> > > From: Liu, Yi L
> > > Sent: Thursday, March 26, 2020 2:15 PM
> > > To: 'Peter Xu' <peterx@xxxxxxxxxx>
> > > Subject: RE: [PATCH v1 12/22] intel_iommu: add PASID cache management
> > > infrastructure
> > >
> > > > From: Peter Xu <peterx@xxxxxxxxxx>
> > > > Sent: Wednesday, March 25, 2020 10:52 PM
> > > > To: Liu, Yi L <yi.l.liu@xxxxxxxxx>
> > > > Subject: Re: [PATCH v1 12/22] intel_iommu: add PASID cache management
> > > > infrastructure
> > > >
> > > > On Wed, Mar 25, 2020 at 12:20:21PM +0000, Liu, Yi L wrote:
> > > > > > From: Peter Xu <peterx@xxxxxxxxxx>
> > > > > > Sent: Wednesday, March 25, 2020 1:32 AM
> > > > > > To: Liu, Yi L <yi.l.liu@xxxxxxxxx>
> > > > > > Subject: Re: [PATCH v1 12/22] intel_iommu: add PASID cache
> > > > > > management infrastructure
> > > > > >
> > > > > > On Sun, Mar 22, 2020 at 05:36:09AM -0700, Liu Yi L wrote:
> > > > > > > This patch adds a PASID cache management infrastructure based on
> > > > > > > new added structure VTDPASIDAddressSpace, which is used to track
> > > > > > > the PASID usage and future PASID tagged DMA address translation
> > > > > > > support in vIOMMU.
> > [...]
> > > > > >
> > > > > > <START>
> > > > > >
> > > > > > > +    /*
> > > > > > > +     * Step 2: loop all the exisitng vtd_dev_icx instances.
> > > > > > > +     * Ideally, needs to loop all devices to find if there is any new
> > > > > > > +     * PASID binding regards to the PASID cache invalidation request.
> > > > > > > +     * But it is enough to loop the devices which are backed by host
> > > > > > > +     * IOMMU. For devices backed by vIOMMU (a.k.a emulated devices),
> > > > > > > +     * if new PASID happened on them, their vtd_pasid_as instance could
> > > > > > > +     * be created during future vIOMMU DMA translation.
> > > > > > > +     */
> > > > > > > +    QLIST_FOREACH(vtd_dev_icx, &s->vtd_dev_icx_list, next) {
> > > > > > > +        VTDPASIDAddressSpace *vtd_pasid_as;
> > > > > > > +        VTDPASIDCacheEntry *pc_entry;
> > > > > > > +        VTDPASIDEntry pe;
> > > > > > > +        VTDBus *vtd_bus = vtd_dev_icx->vtd_bus;
> > > > > > > +        uint16_t devfn = vtd_dev_icx->devfn;
> > > > > > > +        int bus_n = pci_bus_num(vtd_bus->bus);
> > > > > > > +
> > > > > > > +        /* i) fetch vtd_pasid_as and check if it is valid */
> > > > > > > +        vtd_pasid_as = vtd_add_find_pasid_as(s, vtd_bus,
> > > > > > > +                                             devfn, pasid);
> > > > > >
> > > > > > I don't feel like it's correct here...
> > > > > >
> > > > > > Assuming we have two devices assigned D1, D2.  D1 uses PASID=1, D2
> > > > > > uses
> > > > PASID=2.
> > > > > > When invalidating against PASID=1, are you also going to create a
> > > > > > VTDPASIDAddressSpace also for D2 with PASID=1?
> > > > >
> > > > > Answer is no. Before going forward, let's see if the below flow looks good to
> you.
> > > > >
> > > > > Let me add one more device besides D1 and D2. Say device D3 which
> > > > > also uses PASID=1. And assume it begins with no PASID usage in guest.
> > > > >
> > > > > Then the flow from scratch is:
> > > > >
> > > > > a) guest IOMMU driver setup PASID entry for D1 with PASID=1,
> > > > >    then invalidates against PASID #1
> > > > > b) trap to QEMU, and comes to this function. Since there is
> > > > >    no previous pasid cache invalidation, so the Step 1 of this
> > > > >    function has nothing to do, then goes to Step 2 which is to
> > > > >    loop all assigned devices and check if the guest pasid entry
> > > > >    is present. In this loop, should find D1's pasid entry for
> > > > >    PASID#1 is present. So create the VTDPASIDAddressSpace and
> > > > >    initialize its pasid_cache_entry and pasid_cache_gen fields.
> > > > > c) guest IOMMU driver setup PASID entry for D2 with PASID=2,
> > > > >    then invalidates against PASID #2
> > > > > d) same with b), only difference is the Step 1 of this function
> > > > >    will loop the VTDPASIDAddressSpace created in b), but its
> > > > >    pasid is 1 which is not the target of current pasid cache
> > > > >    invalidation. Same with b), in Step 2, it will create a
> > > > >    VTDPASIDAddressSpace for D2+PASID#2
> > > > > e) guest IOMMU driver setup PASID entry for D3 with PASID=1,
> > > > >    then invalidates against PASID #1
> > > > > f) trap to QEMU and comes to this function. Step 1 loops two
> > > > >    VTDPASIDAddressSpace created in b) and d), and it finds there
> > > > >    is a VTDPASIDAddressSpace whose pasid is 1. vtd_flush_pasid()
> > > > >    will check if the cached pasid entry is the same with the one
> > > > >    in guest memory. In this flow, it should be the same, so
> > > > >    vtd_flush_pasid() will do nothing for it. Then comes to Step 2,
> > > > >    it loops D1, D2, D3.
> > > > >    - For D1, no need to do more since there is already a
> > > > >      VTDPASIDAddressSpace created for D1+PASID#1.
> > > > >    - For D2, its guest pasid entry for PASID#1 is not present, so
> > > > >      no need to do anything for it.
> > > > >    - For D3, its guest pasid entry for PASID#1 is present and it
> > > > >      is exactly the reason for this invalidation. So create a
> > > > >      VTDPASIDAddressSpace for and init the pasid_cache_entry and
> > > > >      pasid_cache_gen fields.
> > > > >
> > > > > > I feel like we shouldn't create VTDPASIDAddressSpace only if it
> > > > > > existed, say, until when we reach vtd_dev_get_pe_from_pasid() below
> with
> > > retcode==0.
> > > > >
> > > > > You are right. I think I failed to destroy the VTDAddressSpace when
> > > > > vtd_dev_get_pe_from_pasid() returns error. Thus the code won't
> > > > > create a VTDPASIDAddressSpace for D2+PASID#1.
> > > >
> > > > OK, but that free() is really not necessary...  Please see below.
> > > >
> > > > >
> > > > > > Besides this...
> > > > > >
> > > > > > > +        pc_entry = &vtd_pasid_as->pasid_cache_entry;
> > > > > > > +        if (s->pasid_cache_gen == pc_entry->pasid_cache_gen) {
> > > > > > > +            /*
> > > > > > > +             * pasid_cache_gen equals to s->pasid_cache_gen means
> > > > > > > +             * vtd_pasid_as is valid after the above s->vtd_pasid_as
> > > > > > > +             * updates in Step 1. Thus no need for the below steps.
> > > > > > > +             */
> > > > > > > +            continue;
> > > > > > > +        }
> > > > > > > +
> > > > > > > +        /*
> > > > > > > +         * ii) vtd_pasid_as is not valid, it's potentailly a new
> > > > > > > +         *    pasid bind. Fetch guest pasid entry.
> > > > > > > +         */
> > > > > > > +        if (vtd_dev_get_pe_from_pasid(s, bus_n, devfn, pasid,
> > > > > > > + &pe)) {
> > > > >
> > > > > Yi: should destroy pasid_as as there is no valid pasid entry. Thus
> > > > > to ensure all the pasid_as in hash table are valid.
> > > > >
> > > > > > > +            continue;
> > > > > > > +        }
> > > > > > > +
> > > > > > > +        /*
> > > > > > > +         * iii) pasid entry exists, update pasid cache
> > > > > > > +         *
> > > > > > > +         * Here need to check domain ID since guest pasid entry
> > > > > > > +         * exists. What needs to do are:
> > > > > > > +         *   - update the pc_entry in the vtd_pasid_as
> > > > > > > +         *   - set proper pc_entry.pasid_cache_gen
> > > > > > > +         *   - pass down the latest guest pasid entry config to host
> > > > > > > +         *     (will be added in later patch)
> > > > > > > +         */
> > > > > > > +        if (domain_id == vtd_pe_get_domain_id(&pe)) {
> > > > > > > +            vtd_fill_in_pe_in_cache(s, vtd_pasid_as, &pe);
> > > > > > > +        }
> > > > > > > +    }
> > > > > >
> > > > > > <END>
> > > > > >
> > > > > > ... I'm a bit confused on the whole range between <START> and
> > > > > > <END> on how it differs from the vtd_replay_guest_pasid_bindings()
> you're
> > > going to introduce.
> > > > > > Shouldn't the replay code do similar thing?  Can we merge them?
> > > > >
> > > > > Yes, there is similar thing which is to create VTDPASIDAddressSpace
> > > > > per the guest pasid entry presence.
> > > > >
> > > > > But there are differences. For one, the code here is to loop all
> > > > > assigned devices for a specific PASID. While the logic in
> > > > > vtd_replay_guest_pasid_bindings() is to loop all assigned devices
> > > > > and the PASID tables behind them. For two, the code here only cares
> > > > > about the case which guest pasid entry from INVALID->VALID.
> > > > > The reason is in Step 1 of this function, VALID->INVALID and
> > > > > VALID->VALID cases are already covered. While the logic in
> > > > > vtd_replay_guest_pasid_bindings() needs to cover all the three cases.
> > > > > The last reason I didn't merge them is in
> > > > > vtd_replay_guest_pasid_bindings() it needs to divide the pasid entry
> > > > > fetch into two steps and check the result (if fetch pasid directory
> > > > > entry failed, it could skip a range of PASIDs). While the code in
> > > > > this function, it doesn't care about it, it only cares if there is a
> > > > > valid pasid entry for this specific pasid.
> > > > >
> > > > > >
> > > > > > My understanding is that we can just make sure to do it right once
> > > > > > in the replay code (the three cases: INVALID->VALID,
> > > > > > VALID->INVALID,
> > > > > > VALID->VALID), then no matter whether it's DSI/PSI/GSI, we call
> > > > > > VALID->the
> > > > > > replay code probably with VTDPASIDCacheInfo* passed in, then the
> > > > > > replay code
> > > > will
> > > > > > know what to look after.
> > > > >
> > > > > Hmmm, let me think more to abstract the code between the <START> and
> > > > > <END> to be a helper function to be called by
> > > > > vtd_replay_guest_pasid_bindings(). Also, in that case, I need to
> > > > > apply the two step concept in the replay function.
> > > >
> > > > ... I think your vtd_sm_pasid_table_walk() is already suitable for
> > > > this because it allows to specify "start" and "end" pasid.  Now you're
> > > > always passing in the (0, VTD_MAX_HPASID) tuple, here you can simply
> > > > pass in (pasid, pasid+1).  But I think you need to touch up
> > > > vtd_sm_pasid_table_walk() a bit to make sure it allows the pasid to be
> > > > not aliged to VTD_PASID_TBL_ENTRY_NUM.
> > > >
> > > > Another thing is I think vtd_sm_pasid_table_walk_one() didn't really
> > > > check vtd_pasid_table_walk_info.did domain information...  When that's
> > > > fixed, this case is the same as the DSI walk with a specific pasid
> > > > range.
> > >
> > > got it, let me refactor them (PSI and replay).
> > >
> > > > And again, please also consider to use VTDPASIDCacheInfo to be used
> > > > directly during the page walk, so vtd_pasid_table_walk_info can be
> > > > replaced I think, because IIUC VTDPASIDCacheInfo contains all
> > > > information the table walk will need.
> > >
> > > yes, no need to have the walk_info structure.
> > I'm not quite get here. Why cache_gen increase only happen in PSI
> > hook? I think cache_gen used to avoid drop all pasid_as when a pasid
> > cache reset happened.
> 
> (Is this paragraph for the other thread?  Let me know if it's not, or
>  I'll skip it)
yes :-(, please skip it.

> >
> >
> > Today, I'm trying to replace vtd_pasid_table_walk_info with
> > VTDPASIDCacheInfo. But I found it may be a little bit strange.
> > The vtd_pasid_table_walk_info include vtd_bus/devfn/did and a
> > flag to indicate if did is useful.
> 
> vtd_pasid_table_walk_info.flags can only be either
> VTD_PASID_TABLE_DID_SEL_WALK or nothing, but IIUC that's the same as
> checking against VTDPASIDCacheInfo.flags with:
> 
>     (VTD_PASID_CACHE_DOMSI | VTD_PASID_CACHE_PASIDSI)
> 
> > The final user of the walk
> > info is vtd_sm_pasid_table_walk_one() which only cares about
> > the the vtd_bus/devfn/did. But VTDPASIDCacheInfo has an extra
> > pasid field and also has multiple flag definitions
> 
> We can simply ignore the pasid field when walking the pasid table?
> Just like we'll also ignore the domain id field if flag not set.
> 
> > , which are
> > not necessary for the table work. So it appears to me use
> > separate structure would be better. Maybe I can show you when
> > sending out the code.
> 
> I still keep my previous comment that I think VTDPASIDCacheInfo can do
> all the work, especially because all the pasid table walk triggers
> from a pasid flush, so we can really reuse exactly the same
> VTDPASIDCacheInfo structure that we just allocated, iiuc.

Ok, let me have a try.

> >
> > > > >
> > > > > > > +
> > > > > > > +    vtd_iommu_unlock(s);
> > > > > > >      return 0;
> > > > > > >  }
> > > > > > >
> > > > > > > +/**
> > > > > > > + * Caller of this function should hold iommu_lock  */ static
> > > > > > > +void vtd_pasid_cache_reset(IntelIOMMUState *s) {
> > > > > > > +    VTDPASIDCacheInfo pc_info;
> > > > > > > +
> > > > > > > +    trace_vtd_pasid_cache_reset();
> > > > > > > +
> > > > > > > +    pc_info.flags = VTD_PASID_CACHE_GLOBAL;
> > > > > > > +
> > > > > > > +    /*
> > > > > > > +     * Reset pasid cache is a big hammer, so use
> > > > > > > +     * g_hash_table_foreach_remove which will free
> > > > > > > +     * the vtd_pasid_as instances, indicates the
> > > > > > > +     * cached pasid_cache_gen would be set to 0.
> > > > > > > +     */
> > > > > > > +    g_hash_table_foreach_remove(s->vtd_pasid_as,
> > > > > > > +                           vtd_flush_pasid, &pc_info);
> > > > > >
> > > > > > Would this make sure the per pasid_as pasid_cache_gen will be reset to
> zero?
> > > > I'm
> > > > > > not very sure, say, what if the memory is stall during a reset and
> > > > > > still have the
> > > > old
> > > > > > data?
> > > > > >
> > > > > > I'm not sure, but I feel like we should simply drop all pasid_as
> > > > > > here, rather than using the same code for a global pasid invalidation.
> > > > >
> > > > > I see. Maybe I can get another helper function which always returns
> > > > > true, and replace vtd_flush_pasid with the new function. This should
> > > > > ensure all pasid_as are dropped. How do you think?
> > > >
> > > > g_hash_table_remove_all() might be easier. :)
> > >
> > > right. I'll make it.
> >
> > Sorry to reclaim my reply here. I think here still needs a function (say
> > vtd_flush_pasid) to check if needs to notify host do unbind. e.g. If guest
> > unbind a pasid in guest, and issues a GSI (pasid cache), remove_all()
> > will drop all pasid_as, this would be a problem. The guest unbind will
> > not be propagated to host. And even we add a replay after it, it can
> > only shadow the bindings in guest to host, but cannot figure out an
> > unbind. But I agree with you that vtd_pasid_cache_reset() should drop
> > all pasid_as but also needs to notify host properly.
> 
> Agreed, anyway we should not depend on the pasid entry but we should
> simply loop over all items and force unbind all of them before the
> g_hash_table_remove_all().

sounds good.

Regards,
Yi Liu





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux