RE: [PATCH v3 0/2] VFIO mdev aggregated resources handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> Sent: Thursday, July 9, 2020 2:48 AM
> 
> On Wed, 8 Jul 2020 06:31:00 +0000
> "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> 
> > > From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > > Sent: Wednesday, July 8, 2020 9:07 AM
> > >
> > > On Tue, 7 Jul 2020 23:28:39 +0000
> > > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> > >
> > > > Hi, Alex,
> > > >
> > > > Gentle ping... Please let us know whether this version looks good.
> > >
> > > I figured this is entangled with the versioning scheme.  There are
> > > unanswered questions about how something that assumes a device of a
> > > given type is software compatible to another device of the same type
> > > handles aggregation and how the type class would indicate compatibility
> > > with an aggregated instance.  Thanks,
> > >
> >
> > Yes, this open is an interesting topic. I didn't closely follow the versioning
> > scheme discussion. Below is some preliminary thought in my mind:
> >
> > --
> > First, let's consider migrating an aggregated instance:
> >
> > A conservative policy is to check whether the compatible type is supported
> > on target device and whether available instances under that type can
> afford
> > the ask of the aggregated instance. Compatibility check in this scheme is
> > separated from aggregation check, then no change is required to the
> current
> > versioning interface.
> 
> How many features, across how many attributes is an administrative tool
> supposed to check for compatibility?  ie. if we add an 'aggregation'
> feature now and 'translucency' feature next year, with new sysfs
> attributes and creation options, won't that break this scheme?  I'm not
> willing to assume aggregation is the sole new feature we will ever add,
> therefore we don't get to make it a special case without a plan for how
> the next special case will be integrated.

Got you. I thought aggregation is special since it is purely about linear
resource adjustment w/o changing the feature set of the instance, thus
reasonable to get special handling in management stack which needs
to understand this attribute anyway. But I agree that it's difficult to 
predict the future and other special cases...

> 
> We also can't even seem to agree that type is a necessary requirement
> for compatibility.  Your discussion below of a type-A, which is
> equivalent to a type-B w/ aggregation set to some value is an example
> of this.  We might also have physical devices with extensions to
> support migration.  These could possibly be compatible with full mdev
> devices.  We have no idea how an administrative tool would discover
> this other than an exhaustive search across every possible target.
> That's ugly but feasible when considering a single target host, but
> completely untenable when considering a datacenter.

If exhaustive search can be done just one-off to build the compatibility
database for all assignable devices on each node, then it might be
still tenable in datacenter?

> 
> 
> > Then there comes a case where the target device doesn't handle
> aggregation
> > but support a different type which however provides compatible
> capabilities
> > and same resource size as the aggregated instance expects. I guess this is
> > one puzzle how to check compatibility between such types. One possible
> > extension is to introduce a non_aggregated_list  to indicate compatible
> > non-aggregated types for each aggregated instance. Then mgmt.. stack
> > just loop the compatible list if the conservative policy fails.  I didn't think
> > carefully about what format is reasonable here. But if we agree that an
> > separate interface is required to support such usage, then this may come
> > later after the basic migration_version interface is completed.
> 
> ...and then a non_translucency_list and then a non_brilliance_list and
> then a non_whatever_list... no.  Additionally it's been shown difficult
> to predict the future, if a new device is developed to be compatible
> with an existing device it would require updates to the existing device
> to learn about that compatibility.

I suppose a compatibility list like this doesn't require the existing device
to update. It should be new device's compatibility to claim compatibility
to the types carried in existing list. 

> 
> > --
> >
> > Another scenario is about migrating a non-aggregated instance to a device
> > handling aggregation. Then there is an open whether an aggregated type
> > can be used to back the non-aggregated instance in case of no available
> > instance under the original type claimed by non-aggregated instance.
> > This won't happen in KVMGT, because all vGPU types share the same
> > resource pool. Allocating instance under one type also decrement available
> > instances under other types. So if we fail to find available instance under
> > type-A (with 4x resource of type-B), then we will also fail to create an
> >  aggregated instance (aggregate=4) under type-B. therefore, we just
> > need stick to basic type compatibility check for non-aggregated instance.
> > And I feel this assumption can be applied to other devices handling
> > aggregation. It doesn't make sense for two types to claim compatibility
> > (only with resource size difference) when their resources are allocated
> > from different pools (which usually implies different capability or QOS/
> > SLA difference). With this assumption, we don't need provide another
> > interface to indicate compatible aggregated types for non-aggregated
> > interface.
> > --
> >
> > I may definitely overlook something here, but if above analysis sounds
> > reasonable, then this series could be decoupled from the versioning
> > scheme discussion based on conservative policy for now. :)
> 
> The only potential I see for decoupling the discussions would be to do
> aggregation via a vendor attribute.  Those already provide a mechanism
> to manipulate a device after creation and something that we'll already
> need to solve in determining migration compatibility.  So in that
> sense, it seems like it at least doesn't make the problem worse.
> Thanks,
> 

This makes some sense, since anyway 'aggregation' still changes how the
instance looks like. But let me understand clearly. Are you proposing 
actually moving 'aggregation' to be a vendor attribute (i.e. removing
the 'mdev' sub-directy in this patch), or more about a policy of treating
it as a vendor attribute? If the former, is there any problem of having
Libvirt manage this attribute given that it becomes vendor specific now?

Thanks
Kevin




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux