* Yan Zhao (yan.y.zhao@xxxxxxxxx) wrote: > > > yes, include a device_api field is better. > > > for mdev, "device_type=vfio-mdev", is it right? > > > > No, vfio-mdev is not a device API, it's the driver that attaches to the > > mdev bus device to expose it through vfio. The device_api exposes the > > actual interface of the vfio device, it's also vfio-pci for typical > > mdev devices found on x86, but may be vfio-ccw, vfio-ap, etc... See > > VFIO_DEVICE_API_PCI_STRING and friends. > > > ok. got it. > > > > > > > device_id=8086591d > > > > > > > > Is device_id interpreted relative to device_type? How does this > > > > relate to mdev_type? If we have an mdev_type, doesn't that fully > > > > defined the software API? > > > > > > > it's parent pci id for mdev actually. > > > > If we need to specify the parent PCI ID then something is fundamentally > > wrong with the mdev_type. The mdev_type should define a unique, > > software compatible interface, regardless of the parent device IDs. If > > a i915-GVTg_V5_2 means different things based on the parent device IDs, > > then then different mdev_types should be reported for those parent > > devices. > > > hmm, then do we allow vendor specific fields? > or is it a must that a vendor specific field should have corresponding > vendor attribute? > > another thing is that the definition of mdev_type in GVT only corresponds > to vGPU computing ability currently, > e.g. i915-GVTg_V5_2, is 1/2 of a gen9 IGD, i915-GVTg_V4_2 is 1/2 of a > gen8 IGD. > It is too coarse-grained to live migration compatibility. Can you explain why that's too coarse? Is this because it's too specific (i.e. that a i915-GVTg_V4_2 could be migrated to a newer device?), or that it's too specific on the exact sizings (i.e. that there may be multiple different sizes of a gen9)? Dave > Do you think we need to update GVT's definition of mdev_type? > And is there any guide in mdev_type definition? > > > > > > > mdev_type=i915-GVTg_V5_2 > > > > > > > > And how are non-mdev devices represented? > > > > > > > non-mdev can opt to not include this field, or as you said below, a > > > vendor signature. > > > > > > > > > aggregator=1 > > > > > > pv_mode="none+ppgtt+context" > > > > > > > > These are meaningless vendor specific matches afaict. > > > > > > > yes, pv_mode and aggregator are vendor specific fields. > > > but they are important to decide whether two devices are compatible. > > > pv_mode means whether a vGPU supports guest paravirtualized api. > > > "none+ppgtt+context" means guest can not use pv, or use ppgtt mode pv or > > > use context mode pv. > > > > > > > > > interface_version=3 > > > > > > > > Not much granularity here, I prefer Sean's previous > > > > <major>.<minor>[.bugfix] scheme. > > > > > > > yes, <major>.<minor>[.bugfix] scheme may be better, but I'm not sure if > > > it works for a complicated scenario. > > > e.g for pv_mode, > > > (1) initially, pv_mode is not supported, so it's pv_mode=none, it's 0.0.0, > > > (2) then, pv_mode=ppgtt is supported, pv_mode="none+ppgtt", it's 0.1.0, > > > indicating pv_mode=none can migrate to pv_mode="none+ppgtt", but not vice versa. > > > (3) later, pv_mode=context is also supported, > > > pv_mode="none+ppgtt+context", so it's 0.2.0. > > > > > > But if later, pv_mode=ppgtt is removed. pv_mode="none+context", how to > > > name its version? "none+ppgtt" (0.1.0) is not compatible to > > > "none+context", but "none+ppgtt+context" (0.2.0) is compatible to > > > "none+context". > > > > If pv_mode=ppgtt is removed, then the compatible versions would be > > 0.0.0 or 1.0.0, ie. the major version would be incremented due to > > feature removal. > > > > > Maintain such scheme is painful to vendor driver. > > > > Migration compatibility is painful, there's no way around that. I > > think the version scheme is an attempt to push some of that low level > > burden on the vendor driver, otherwise the management tools need to > > work on an ever growing matrix of vendor specific features which is > > going to become unwieldy and is largely meaningless outside of the > > vendor driver. Instead, the vendor driver can make strategic decisions > > about where to continue to maintain a support burden and make explicit > > decisions to maintain or break compatibility. The version scheme is a > > simplification and abstraction of vendor driver features in order to > > create a small, logical compatibility matrix. Compromises necessarily > > need to be made for that to occur. > > > ok. got it. > > > > > > > COMPATIBLE: > > > > > > device_type=pci > > > > > > device_id=8086591d > > > > > > mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} > > > > > this mixed notation will be hard to parse so i would avoid that. > > > > > > > > Some background, Intel has been proposing aggregation as a solution to > > > > how we scale mdev devices when hardware exposes large numbers of > > > > assignable objects that can be composed in essentially arbitrary ways. > > > > So for instance, if we have a workqueue (wq), we might have an mdev > > > > type for 1wq, 2wq, 3wq,... Nwq. It's not really practical to expose a > > > > discrete mdev type for each of those, so they want to define a base > > > > type which is composable to other types via this aggregation. This is > > > > what this substitution and tagging is attempting to accomplish. So > > > > imagine this set of values for cases where it's not practical to unroll > > > > the values for N discrete types. > > > > > > > > > > aggregator={val1}/2 > > > > > > > > So the {val1} above would be substituted here, though an aggregation > > > > factor of 1/2 is a head scratcher... > > > > > > > > > > pv_mode={val2:string:"none+ppgtt","none+context","none+ppgtt+context"} > > > > > > > > I'm lost on this one though. I think maybe it's indicating that it's > > > > compatible with any of these, so do we need to list it? Couldn't this > > > > be handled by Sean's version proposal where the minor version > > > > represents feature compatibility? > > > yes, it's indicating that it's compatible with any of these. > > > Sean's version proposal may also work, but it would be painful for > > > vendor driver to maintain the versions when multiple similar features > > > are involved. > > > > This is something vendor drivers need to consider when adding and > > removing features. > > > > > > > > interface_version={val3:int:2,3} > > > > > > > > What does this turn into in a few years, 2,7,12,23,75,96,... > > > > > > > is a range better? > > > > I was really trying to point out that sparseness becomes an issue if > > the vendor driver is largely disconnected from how their feature > > addition and deprecation affects migration support. Thanks, > > > ok. we'll use the x.y.z scheme then. > > Thanks > Yan > -- Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK