On Fri, 17 Nov 2017 15:28:16 -0500 Tony Krowiak <akrowiak@xxxxxxxxxxxxxxxxxx> wrote: > On 11/17/2017 05:07 AM, Cornelia Huck wrote: > > On Fri, 17 Nov 2017 08:07:15 +0100 > > Pierre Morel <pmorel@xxxxxxxxxxxxxxxxxx> wrote: > > > >> On 17/11/2017 00:35, Tony Krowiak wrote: > >>> On 11/16/2017 03:25 PM, Pierre Morel wrote: > >>>> On 16/11/2017 18:03, Cornelia Huck wrote: > >>>>> On Thu, 16 Nov 2017 17:06:58 +0100 > >>>>> Pierre Morel <pmorel@xxxxxxxxxxxxxxxxxx> wrote: > >>>>>> So I totally agree with Conny on that we should stabilize the > >>>>>> bus/device/driver modeling. > >>>>>> > >>>>>> I think it would be here a good place to start the discussion on things > >>>>>> like we started to discuss, Harald and I, off-line: > >>>>>> - why a matrix bus, in which case we can avoid it > >>>>> I thought it had been agreed that we should be able to ditch it? > >>>> I have not see any comment on the matrix bus. > >>> As stated in a previous email responding to Connie, I decided to scrap the > >>> AP matrix bus. There will only ever be one matrix device that serves two > >>> purposes: To hold the APQNs of the queue devices bound to the VFIO AP > >>> matrix > >>> device driver; to serve as a parent of the mediated devices created for > >>> guests requiring access to the APQNs reserved for their use. So, instead > >>> of an AP matrix bus creating the matrix device, it will be created by the > >>> VFIO AP matrix driver in /sys/devices/ap_matrix/ during driver > >>> initialization. > >> Sorry, I did not see the mail, this of course change a lot of things... > > One thing that would be useful for the next iteration is some ascii-art > > representation that shows how the different parts (matrix, ap driver, > > mdev, ...) tie together. That also would be useful to have in the > > documentation. > I plan on including some drawings with the documentation and will include it > in the cover letter as well. Sounds good. > >>>>>> - how to handle the repartition of queues on boot, reset and hotplug > >>> What do you mean by repartition of queues on boot? > >>>>> That's something I'd like to see a writeup for. > >>>> yes, and it may have an influence on the bus/device/driver/mdev design > >>> I don't understand the need to avoid implementation details. If you recall, > >>> the original design was modeled on AP queue devices. It was only after > >>> implementing that design that the shortcomings were revealed which is > >>> why we decided to base the model on the AP matrix. Keep in mind, this is > >>> an RFC, not a final patch set. I would expect some change from the > >>> implementation herein. In fact, I've already made many changes based on > >>> Connie's and Christian's review comments, none of which resulted in an > >>> overhaul of the design. > > I expect that any of the above can be accommodated by the design. A > > short writeup of what we may want to do for that would certainly help > > to validate that, though. > I have spent some time thinking about hotplug implementation and I > believe it can be accommodated within this design. I haven't looked > at the implications for reset yet and I don't really know what is > meant by "repartition of queues". I will include a write-up in the > next submission. FWIW, "repartition of queues" is also unclear to me. > >>>>>> - interruptions > >>>>> My understanding is that interrupts are optional so they can be left > >>>>> out in the first shot. With the gisa (that has not yet been posted), it > >>>>> should not be too difficult, no? > >>>> you are right I forgot that it is optional > >>> If the facilities bit (STFLE.65) indicating interrupts are available is not > >>> set for the guest, then the AP bus running on the guest will poll and > >>> interrupts will not have to be handled. This patch set does not enable > >>> interrupts, so it is not relevant at this time. We will not be able to > >>> handle interrupts for the guest until the GISA for passthrough patches > >>> are available. This will be addressed at that time. > > If you think it can be easily added later on, that would be fine for > > me. (I cannot comment on gisa details until it has been posted, > > obviously.) > Enabling AP interrupts is accomplished using the PQAP(AQIC) instruction > which is a mandatory interception. The instruction will be forwarded to > the VFIO AP device driver via an ioctl call on the mediated matrix > device file descriptor. There will be some GISA set up needed and code > to feed the interrupt back to user space, but I believe that will be > provided by the forthcoming GISA passthrough patches. The bottom line is, > I don't anticipate any major design change to handle interrupt processing. Cool, that's what I wanted to hear. > >>>>>> - virtualization of the AP > >>>>> Is this really needed? It would complicate everything a lot. > >>>> Concern has no sens without interception. > >>> Virtualization of AP is not on the table right now. > >> If we implement interception, we must speak about this, even to say how > >> we do not implement virtualization. > > A note that we do not plan to virtualize it right now would be > > sensible, yes. > Will do. > > > > From what I remember, this would mean opening a huge can of worms for > > something that might be only of limited use. I'd prefer a simplistic > > but usable approach first. If virtualization should really become a > > requirement in the future, it might be better served by a different > > mechanism anyway. > I have done a little proof of concept code to get an idea if the AP matrix > design will be extensible to handle virtualization. I modeled the > proof of concept on the AP matrix model by creating a second mediated > matrix device type for virtualization. Of course, virtual and passthrough > matrix device types would have to be mutually exclusive; the admin would > have > to choose one or the other. The sysfs model looked like this: > > /sys/devices/ap_matrix > ... [matrix] > ...... [mdev_supported_types] > ......... [vfio_ap_matrix-virtual] > ............ create > ............... [devices] > .................. [$uuid] > ..................... assign_adapter > ..................... assign_domain > > Using the a assign_adapter file, one can assign a virtual adapter > ID to one or more real adapter IDs. For example, to assign virtual adapter > 4 to real adapters 3, 22 and 254: > > echo 4:3,22,254 > assign_adapter > > Using the a assign_domain file, one can assign a virtual domain > ID to one or more real domain IDs. For example, to assign virtual domain > 0 to real domains 8 and 71: > > echo 0:8,0x47 > assign_domain > > All AP instructions would be intercepted for a virtual matrix. The > intercepted > instructions would be forwarded to the VFIO AP matrix device driver by QEMU > using an ioctl implemented by the VFIO AP matrix driver. If the mediated > matrix > device is vfio_ap_matrix-passthrough type, things would work as they do now. > If the type is vfio_ap_matrix-virtual, the the driver would: > > 1. Calculate all of the real APQNs that can be used by: > * Retrieving the adapter IDs mapped to the APID specified in the APQN > contained in the AP instruction > * Retrieving the domain IDs mapped to the APQI specified in the APQN > contained in the AP instruction > * Combining all of the permutations of APID/APQI > 2. Determine which APQN would be best to use. > 3. Execute the instruction > 4. Return the result to the caller > > In other words, I think the current design is extensible; but even if not, > I see no reason we can't design a completely different mechanism for > virtualization. So it's basically a one-time effort at (re)configuration, and the virtualization facility will basically take care of the rest?