On Tue, 1 Dec 2020 17:12:56 -0500 Tony Krowiak <akrowiak@xxxxxxxxxxxxx> wrote: > On 12/1/20 12:56 PM, Halil Pasic wrote: > > On Tue, 1 Dec 2020 00:32:27 +0100 > > Halil Pasic <pasic@xxxxxxxxxxxxx> wrote: > > > >>> > >>> On 11/28/20 8:52 PM, Halil Pasic wrote: > >> [..] > >>>>> * Unassign adapter from mdev's matrix: > >>>>> > >>>>> The domain will be hot unplugged from the KVM guest if it is > >>>>> assigned to the guest's matrix. > >>>>> > >>>>> * Assign a control domain: > >>>>> > >>>>> The control domain will be hot plugged into the KVM guest if it is not > >>>>> assigned to the guest's APCB. The AP architecture ensures a guest will > >>>>> only get access to the control domain if it is in the host's AP > >>>>> configuration, so there is no risk in hot plugging it; however, it will > >>>>> become automatically available to the guest when it is added to the host > >>>>> configuration. > >>>>> > >>>>> * Unassign a control domain: > >>>>> > >>>>> The control domain will be hot unplugged from the KVM guest if it is > >>>>> assigned to the guest's APCB. > >>>> This is where things start getting tricky. E.g. do we need to revise > >>>> filtering after an unassign? (For example an assign_adapter X didn't > >>>> change the shadow, because queue XY was missing, but now we unplug domain > >>>> Y. Should the adapter X pop up? I guess it should.) > >>> I suppose that makes sense at the expense of making the code > >>> more complex. It is essentially what we had in the prior version > >>> which used the same filtering code for assignment as well as > >>> host AP configuration changes. > >>> > >> Will have to think about it some more. Making the user unplug and > >> replug an adapter because at some point it got filtered, but there > >> is no need to filter it does not feel right. On the other hand, I'm > >> afraid I'm complaining in circles. > > I did some thinking. The following statements are about the state of > > affairs, when all 17 patches are applied. I'm commenting here, because > > I believe this is the patch that introduces the most controversial code. > > > > First about low level problems with the current code/design. The other is > > empty handling in vfio_ap_assign_apid_to_apcb() (and > > vfio_ap_assign_apqi_to_apcb()) is troublesome. The final product > > allows for over-commitment, i.e. assignment of e.g. domains that > > are not in the crypto host config. Let's assume the host LPAR > > has usage domains 1 and 2, and adapters 1, 2, and 3. The apmask > > and aqmask are both 0 (all in on vfio), all bound. We start with an empty > > mdev that is tied to a running guest: > > assign_adapter 1 > > assign_adapter 2 > > assign_adapter 3 > > assign_adapter 4 > > all of these will work. The resulting shadow_apcb is completely empty. No > > commit_apcb. > > assign_domain 1 > > assign_domain 2 > > assign_domain 3 > > all of these will work. But again the shadow_apcb is completely empty at > > the end: we did get to the loop that is checking the boundness of the > > queues, but please note that we are checking against matrix.apm, and > > adapter 4 is not in the config of the host. > > > > I've hacked up a fixup patch for these problems that simplifies the > > code considerably, but there are design level issues, that run deeper, > > so I'm not sure the fixups are the way to go. > > > > Now lets talk about design level stuff. Currently the assignment > > operations are designed in to accommodate the FCFS principle. This > > is a blessing and a curse at the same time. > > > > Consider the following scenarios. We have an empty (nothing assigned > > mdev) and the following queues are bound to the vfio_ap driver: > > 0.0 > > 0.1 > > 1.0 > > If the we do > > asssign_adapter 0 > > assign_domain 0 > > assign_domain 1 > > assign_adapter 1 > > We end up with the guest_matrix > > 0.0 > > 0.1 > > and the matrix > > 0.0 > > 0.1 > > 1.0 > > 1.0 > > > > That is a different result compared to > > asssign_adapter 0 > > assign_domain 0 > > assign_adapter 1 > > assign_domain 1 > > or the situation where we have 0.0, 0.1, 1.0 and 1.1 bound to vfio_ap > > and then 1.1 gets unbound. > > In v11 of the patch series, the filtering code always filters > the matrix assigned to the mdev and is invoked whenever > an adapter or domain is assigned, a queue is probed and > when the AP bus scan complete notification is received and > adapters and/or domains have been added to the host AP > configuration. So I made a slight modification to that > filtering function to filter only by APID and ran the above > scenarios. In each case, the resulting guest matrix was > identicle. I also tested the bind/unbind and achieved the > same results. > > > > > For the same system state (bound, config, ap_perm, matrix) you get a > > different outcomes (guest_matrix), because the outcomes depend on > > history. > > > > Another thing is recovery. I believe the main idea behind shadow_apcb > > is that we should auto recover once the resources are available again. > > The current design choices make recovery more difficult to think about > > because we may end up having either the apid or the apqi filtered on > > a 'hole' (an queue missing for reasons different than, belonging to > > default, or not being in the host config). > > The filtering code from the v11 series with the tweak I > mentioned above accomplishes this. I tested this by > doing manual binds/unbinds of a queue using the > scenarios you layed out. > > > > > I still think for these cases filtering out the apid is the lesser > > evil. Yes a hotplug of a domain making hot unplugging an adapter is > > ugly, but at least I can describe that. So I propose the following. > > Let me hack up a fixup that morphs things in this direction. Maybe > > I will run into unexpected problems, but if I don't then we will > > have an alternative design you can run your testcases against. How about > > that? > > I appreciate the offer, but I believe with the change to the v11 > filtering code I described above we have a solution. One of > your objections to the filtering code was looping over all > assigned adapters/domains each time an adapter or > domain is assigned. It should also be easy to examine only > the APQNs involving the new APID or APQI being assigned. > Again, I appreciate your offer, but I don't think it is necessary > to take you away from your priorities to involve yourself in > mine. Seems you have it sorted out. Unfortunately I can't really follow without code, but I have to trust you. Can you please spin a v13 with these improvements implemented? Maybe I didn't comment on every patch, but I did go through all of them. I believe we have enough material for another iteration, and further review makes no sense at this point. I intend to come back to this once v13 is out. Thanks, Halil