On Thu, Aug 20, 2020 at 02:24:26PM +0100, Sean Mooney wrote: > On Thu, 2020-08-20 at 14:27 +0800, Yan Zhao wrote: > > On Thu, Aug 20, 2020 at 06:16:28AM +0100, Sean Mooney wrote: > > > On Thu, 2020-08-20 at 12:01 +0800, Yan Zhao wrote: > > > > On Thu, Aug 20, 2020 at 02:29:07AM +0100, Sean Mooney wrote: > > > > > On Thu, 2020-08-20 at 08:39 +0800, Yan Zhao wrote: > > > > > > On Tue, Aug 18, 2020 at 11:36:52AM +0200, Cornelia Huck wrote: > > > > > > > On Tue, 18 Aug 2020 10:16:28 +0100 > > > > > > > Daniel P. Berrangé <berrange@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > > On Tue, Aug 18, 2020 at 05:01:51PM +0800, Jason Wang wrote: > > > > > > > > > On 2020/8/18 下午4:55, Daniel P. Berrangé wrote: > > > > > > > > > > > > > > > > > > On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: > > > > > > > > > > > > > > > > > > On 2020/8/14 下午1:16, Yan Zhao wrote: > > > > > > > > > > > > > > > > > > On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: > > > > > > > > > > > > > > > > > > On 2020/8/10 下午3:46, Yan Zhao wrote: > > > > > > > > > we actually can also retrieve the same information through sysfs, .e.g > > > > > > > > > > > > > > > > > > |- [path to device] > > > > > > > > > |--- migration > > > > > > > > > | |--- self > > > > > > > > > | | |---device_api > > > > > > > > > | | |---mdev_type > > > > > > > > > | | |---software_version > > > > > > > > > | | |---device_id > > > > > > > > > | | |---aggregator > > > > > > > > > | |--- compatible > > > > > > > > > | | |---device_api > > > > > > > > > | | |---mdev_type > > > > > > > > > | | |---software_version > > > > > > > > > | | |---device_id > > > > > > > > > | | |---aggregator > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes but: > > > > > > > > > > > > > > > > > > - You need one file per attribute (one syscall for one attribute) > > > > > > > > > - Attribute is coupled with kobject > > > > > > > > > > > > > > Is that really that bad? You have the device with an embedded kobject > > > > > > > anyway, and you can just put things into an attribute group? > > > > > > > > > > > > > > [Also, I think that self/compatible split in the example makes things > > > > > > > needlessly complex. Shouldn't semantic versioning and matching already > > > > > > > cover nearly everything? I would expect very few cases that are more > > > > > > > complex than that. Maybe the aggregation stuff, but I don't think we > > > > > > > need that self/compatible split for that, either.] > > > > > > > > > > > > Hi Cornelia, > > > > > > > > > > > > The reason I want to declare compatible list of attributes is that > > > > > > sometimes it's not a simple 1:1 matching of source attributes and target attributes > > > > > > as I demonstrated below, > > > > > > source mdev of (mdev_type i915-GVTg_V5_2 + aggregator 1) is compatible to > > > > > > target mdev of (mdev_type i915-GVTg_V5_4 + aggregator 2), > > > > > > (mdev_type i915-GVTg_V5_8 + aggregator 4) > > > > > > > > > > the way you are doing the nameing is till really confusing by the way > > > > > if this has not already been merged in the kernel can you chagne the mdev > > > > > so that mdev_type i915-GVTg_V5_2 is 2 of mdev_type i915-GVTg_V5_1 instead of half the device > > > > > > > > > > currently you need to deived the aggratod by the number at the end of the mdev type to figure out > > > > > how much of the phsicial device is being used with is a very unfridly api convention > > > > > > > > > > the way aggrator are being proposed in general is not really someting i like but i thin this at least > > > > > is something that should be able to correct. > > > > > > > > > > with the complexity in the mdev type name + aggrator i suspect that this will never be support > > > > > in openstack nova directly requireing integration via cyborg unless we can pre partion the > > > > > device in to mdevs staicaly and just ignore this. > > > > > > > > > > this is way to vendor sepecif to integrate into something like openstack in nova unless we can guarentee > > > > > taht how aggreator work will be portable across vendors genericly. > > > > > > > > > > > > > > > > > and aggragator may be just one of such examples that 1:1 matching does not > > > > > > fit. > > > > > > > > > > for openstack nova i dont see us support anything beyond the 1:1 case where the mdev type does not change. > > > > > > > > > > > > > hi Sean, > > > > I understand it's hard for openstack. but 1:N is always meaningful. > > > > e.g. > > > > if source device 1 has cap A, it is compatible to > > > > device 2: cap A, > > > > device 3: cap A+B, > > > > device 4: cap A+B+C > > > > .... > > > > to allow openstack to detect it correctly, in compatible list of > > > > device 2, we would say compatible cap is A; > > > > device 3, compatible cap is A or A+B; > > > > device 4, compatible cap is A or A+B, or A+B+C; > > > > > > > > then if openstack finds device A's self cap A is contained in compatible > > > > cap of device 2/3/4, it can migrate device 1 to device 2,3,4. > > > > > > > > conversely, device 1's compatible cap is only A, > > > > so it is able to migrate device 2 to device 1, and it is not able to > > > > migrate device 3/4 to device 1. > > > > > > yes we build the palcement servce aroudn the idea of capablites as traits on resocue providres. > > > which is why i originally asked if we coudl model compatibality with feature flags > > > > > > we can seaislyt model deivce as aupport A, A+B or A+B+C > > > and then select hosts and evice based on that but > > > > > > the list of compatable deivce you are propsoeing hide this feature infomation which whould be what we are matching > > > on. > > > > > > give me a lset of feature you want and list ting the feature avaiable on each device allow highre level ocestation > > > to > > > easily match the request to a host that can fulllfile it btu thave a set of other compatihble device does not help > > > with > > > that > > > > > > so if a simple list a capabliteis can be advertiese d and if we know tha two dievce with the same capablity are > > > intercahangebale that is workabout i suspect that will not be the case however and it would onely work within a > > > familay > > > of mdevs that are closely related. which i think agian is an argument for not changeing the mdev type and at least > > > intially only look at migatreion where the mdev type doee not change initally. > > > > > > > sorry Sean, I don't understand your words completely. > > Please allow me to write it down in my words, and please confirm if my > > understanding is right. > > 1. you mean you agree on that each field is regarded as a trait, and > > openstack can compare by itself if source trait is a subset of target trait, right? > > e.g. > > source device > > field1=A1 > > field2=A2+B2 > > field3=A3 > > > > target device > > field1=A1+B1 > > field2=A2+B2 > > filed3=A3 > > > > then openstack sees that field1/2/3 in source is a subset of field1/2/3 in > > target, so it's migratable to target? > > yes this is basically how cpu feature work. > if we see the host cpu on the dest is a supperset of the cpu feature used > by the vm we know its safe to migrate. got it. glad to know it :) > > > > 2. mdev_type + aggregator make it hard to achieve the above elegant > > solution, so it's best to avoid the combined comparing of mdev_type + aggregator. > > do I understand it correctly? > yes and no. one of the challange that mdevs pose right now is that sometiem mdev model > independent resouces and sometimes multipe mdev types consume the same underlying resouces > there is know way for openstack to know if i915-GVTg_V5_2 and i915-GVTg_V5_4 consume the same resouces > or not. as such we cant do the accounting properly so i would much prefer to have just 1 mdev type > i915-GVTg and which models the minimal allocatable unit and then say i want 4 of them comsed into 1 device > then have a second mdev type that does that since > > what that means in pratice is we cannot trust the available_instances for a given mdev type > as consuming a different mdev type might change it. aggrators makes that problem worse. > which is why i siad i would prefer if instead of aggreator as prposed each consumable > resouce was reported indepenedly as different mdev types and then we composed those > like we would when bond ports creating an attachment or other logical aggration that refers > to instance of mdevs of differing type which we expose as a singel mdev that is exposed to the guest. > in a concreate example we might say create a aggreator of 64 cuda cores and 32 tensor cores and "bond them" > or aggrate them as a single attachme mdev and provide that to a ml workload guest. a differnt guest could request > 1 instace of the nvenc video encoder and one instance of the nvenc video decoder but no cuda or tensor for a video > transcoding workload. > The "bond" you described is a little different from the intension of the aggregator we introduced for scalable IOV. (as explained in another mail to Cornelia https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg06523.html). But any way, we agree that mdevs are not compatible if mdev_types are not compatible. > if each of those componets are indepent mdev types and can be composed with that granularity then i think that approch > is better then the current aggreator with vendor sepcific fileds. > we can model the phsical device as being multipel nested resouces with different traits for each type of resouce and > different capsities for the same. we can even model how many of the attachments/compositions can be done indepently > if there is a limit on that. > > |- [parent physical device] > |--- Vendor-specific-attributes [optional] > |--- [mdev_supported_types] > | |--- [<type-id>] > | | |--- create > | | |--- name > | | |--- available_instances > | | |--- device_api > | | |--- description > | | |--- [devices] > | |--- [<type-id>] > | | |--- create > | | |--- name > | | |--- available_instances > | | |--- device_api > | | |--- description > | | |--- [devices] > | |--- [<type-id>] > | |--- create > | |--- name > | |--- available_instances > | |--- device_api > | |--- description > | |--- [devices] > > a benifit of this appoch is we would be the mdev types would not change on migration > and we could jsut compuare a a simeple version stirgh and feature flag list to determin comaptiablity > in a vendor neutral way. i dont nessisarly need to know what the vendeor flags mean just that the dest is a subset of > the source and that the semaitic version numbers say the mdevs are compatible. > > as aggregator and some other attributes are only meaningful after devices are created, and vendors' naming of mdev types are not unified, do you think below way is good? |- [parent physical device] |--- [mdev_supported_types] | |--- [<type-id>] | | |--- create | | |--- name | | |--- available_instances | | |--- compatible_type [must] | | |--- Vendor-specific-compatible-type-attributes [optional] | | |--- device_api [must] | | |--- software_version [must] | | |--- description | | |--- [devices] | | |--------[<uuid>] | | | |--- vendor-specific-compatible-device-attriutes [optional] all vendor specific compatible attributes begin with compatible in name. in GVT's current case, |- 0000\:00\:02.0 |--- mdev_supported_types | |--- i915-GVTg_V5_8 | | |--- create | | |--- name | | |--- available_instances | | |--- compatible_type : i915-GVTg_V5_8, i915-GVTg_V4_8 | | |--- device_api : vfio-pci | | |--- software_version : 1.0.0 | | |--- compatible_pci_ids : 5931, 591b | | |--- description | | |--- devices | | | |- 882cc4da-dede-11e7-9180-078a62063ab1 | | | | | --- aggregator : 1 | | | | | --- compatible_aggregator : 1 suppose 882cc4da-dede-11e7-9180-078a62063ab1 is a src mdev. the sequence for openstack to find a compatible mdev in my mind is that 1. make src mdev type and compatible_type as traits. 2. look for a mdev type that is either i915-GVTg_V4_8 or i915-GVTg_V5_8 as that in compatible_type. (this is just an example, currently we only support migration between mdevs whose attributes are all matching, from mdev type to aggregator, to pci_ids) 3. if 2 fails, try to find a mdev type whose compatible_type is a superset of src compatible_type. if found one, go to step 4; otherwise, quit. 4. check if device_api, software_version under the type are compatible. 5. check if other vendor specific type attributes under the type are compatible. - check if src compatible_pci_ids is a subset of target compatible_pci_ids. 6. check if device is created and not occupied, if not, create one. 7. check if vendor specific attributes under the device are compatible. - check if src compatible_aggregator is a subset of target compatible_aggregator. if fails, try to find counterpart attribute of vendor specific device attribute and set target value according to compatible_xxx in source side. (for compatible_aggregator, its counterpart is aggregator.) if attribute aggregator exists, step 7 succeeds when setting of its value succeeds. if attribute aggregator does not exist, step 7 fails. 8. a compatible target is found. not sure if the above steps look good to you. some changes are required for compatibility check for physical device when mdev_type is absent. but let's first arrive at consensus for mdevs first :) > > 3. you don't like self list and compatible list, because it is hard for > > openstack to compare different traits? > > e.g. if we have self list and compatible list, then as below, openstack needs > > to compare if self field1/2/3 is a subset of compatible field 1/2/3. > currnetly we only use mdevs for vGPUs and in our documentaiton we tell customer > to model the mdev_type as a trait and request it as a reuiqred trait. > so for customer that are doing that today changing mdev types is not really an option. > we would prefer that they request the feature they need instead of a spefic mdev type > so we can select any that meets there needs > for example we have a bunch of traits for cuda support > https://github.com/openstack/os-traits/blob/master/os_traits/hw/gpu/cuda.py > or driectx/vulkan/opengl https://github.com/openstack/os-traits/blob/master/os_traits/hw/gpu/api.py > these are closely analogous to cpu feature flag lix avx or sse > https://github.com/openstack/os-traits/blob/master/os_traits/hw/cpu/x86/__init__.py#L16 > > so when it comes to compatiablities it would be ideal if you could express capablities as something like > a cpu feature flag then we can eaisly model those as traits. > > > > source device: > > self field1=A1 > > self field2=A2+B2 > > self field3=A3 > > > > compatible field1=A1 > > compatible field2=A2;B2;A2+B2; > > compatible field3=A3 > > > > > > target device: > > self field1=A1+B1 > > self field2=A2+B2 > > self field3=A3 > > > > compatible field1=A1;B1;A1+B1; > > compatible field2=A2;B2;A2+B2; > > compatible field3=A3 > > > > > > Thanks > > Yan > > > > > > > > > > > > > > > > > i woudl really prefer if there was just one mdev type that repsented the minimal allcatable unit and the > > > > > aggragaotr where used to create compostions of that. i.e instad of i915-GVTg_V5_2 beign half the device, > > > > > have 1 mdev type i915-GVTg and if the device support 8 of them then we can aggrate 4 of i915-GVTg > > > > > > > > > > if you want to have muplie mdev type to model the different amoutn of the resouce e.g. i915-GVTg_small i915- > > > > > GVTg_large > > > > > that is totlaly fine too or even i915-GVTg_4 indcating it sis 4 of i915-GVTg > > > > > > > > > > failing that i would just expose an mdev type per composable resouce and allow us to compose them a the user > > > > > level > > > > > with > > > > > some other construct mudeling a attament to the device. e.g. create composed mdev or somethig that is an > > > > > aggreateion > > > > > of > > > > > multiple sub resouces each of which is an mdev. so kind of like how bond port work. we would create an mdev for > > > > > each > > > > > of > > > > > the sub resouces and then create a bond or aggrated mdev by reference the other mdevs by uuid then attach only > > > > > the > > > > > aggreated mdev to the instance. > > > > > > > > > > the current aggrator syntax and sematic however make me rather uncofrotable when i think about orchestating vms > > > > > on > > > > > top > > > > > of it even to boot them let alone migrate them. > > > > > > > > > > > > So, we explicitly list out self/compatible attributes, and management > > > > > > tools only need to check if self attributes is contained compatible > > > > > > attributes. > > > > > > > > > > > > or do you mean only compatible list is enough, and the management tools > > > > > > need to find out self list by themselves? > > > > > > But I think provide a self list is easier for management tools. > > > > > > > > > > > > Thanks > > > > > > Yan > > > > > > > > > > > > > > > > > > >