Re: [PATCH v3 00/16] virObject adjustments for common object

Pavel Hrdina <phrdina@xxxxxxxxxx> · Fri, 28 Jul 2017 10:51:21 +0200

On Thu, Jul 27, 2017 at 04:46:41PM -0400, John Ferlan wrote:
> 
> 
> On 07/25/2017 05:29 AM, Pavel Hrdina wrote:
> > On Thu, Jun 22, 2017 at 10:02:30AM -0400, John Ferlan wrote:
> > 
> > Let's move the discussion [1] into correct place.
> > 
> >> Still, I must be missing something. Why is it wrong to create a new
> >> object that would have a specific use? virObjectLockable was created at
> >> some point in history and then used as a way to provide locking for a
> >> virObject that only had a @ref. Someone could still create a virObject
> >> as well and just get @refs. With the new objects based on those two
> >> objects you get a few more features that allow for add, search, lookup,
> >> remove.  But you call that wrong, which is confusing to me.
> > 
> > What's wrong, or better wording, what I don't like personally about this
> > approach is that in order to have features like add, search, lookup,
> > remove it would require that the object is derived from
> > virObjectLookupKeys.  The only thing it adds to the simple virObject
> > is two variables @uuid and @name.
> > 
> > We don't need the virObjectLookupKeys object at all.
> > 
> 
> First off - thanks for taking the time to put your thoughts here. It
> really helped me understand where I believe you were coming from. There
> are some disadvantages to working at home when it comes to sounding
> boards or drawing things up on a white board for those around you in an
> office to provide/receive face-2-face feedback. Black and white text can
> be harsher than intended.
> 
> Now that I understand that it's less 'wrong' and more 'not necessary' or
> as you put it a personal dislike I can come in off of the ledge ;-).
> Still there's a lot of history w/r/t the initial posting where I was
> encouraged to use/update Objects vs. doing what I did creating a new
> module and essentially renaming objs and objlists. Scaling back and
> considering usage of objects sent me down this current path which hasn't
> received as much feedback. It may not be 100% correct, but I really was
> hoping it wasn't wrong especially since I have working code.
> 
> Storing the UUID/Name in an @obj is the means I use to make certain
> decisions along the way - especially w/r/t Add/Remove logic which as I
> pointed out previously - there's no way to "Add" something without a key
> nor "Remove" it; however, the Lookup/Search logic do not need it because
> the hash table keeps track of the key and all the find/lookup vir*obj
> logic would have the @def->{uuid|name} as input for the search. The
> ForEach logic won't use it, although it could (more later).
> 
> Storing the keys in @obj removes a few decision points from the caller.
> The consumer only cares they've created an object with 1 or 2 keys in
> order to allow the objList code "find" those keys and make decisions
> based on the key rather than having multiple API's that fetch using the
> key name in the API name. Adding to the objlist is just that - adding
> and not caring/knowing how many or which tables the object is using.
> 
> While it's certainly not necessary to have UUID/Name and a LookupKeys
> object, it is convenient and more than just for UUID/Name. The new
> object is (will be) used to add bool's (active, persistent, started,
> etc.) that are found in multiple objects in order to have a common API
> to manage that. It's not required, but something possible. There are
> some other 'shared' type pieces of data (configFile, autostartLink,
> autostart, etc.) that could also make use of "sharing" data and code if

The shared code for storing a list of object is what we need and want,
however adding this 'shared' data into the listing API is out of scope
of that API and not all objects uses this data.  Yes I can see the
benefit, but IMHO it doesn't belong into that API.

> so inclined. Of course the primary reason I've adjusted all those
> obj->def->X to be def = obj->def and def->X access is that moving the
> @def and @newDef inside the object was another initial goal, which needs
> to be rethought/reworked as that was considered too generic and possibly
> prone to coding errors.
> 
> > For example, current add/remove API is:
> > 
> > 
> > ==== virObjectLookupHashNew ====
> > 
> > virObjectLookupHashNew(virClassPtr klass,
> >                        int tableElemsStart);
> > 
> > could be
> > 
> > virObjectListNew(int tableSize,
> >                  bool useName,
> >                  bool useUUID);
> > 
> > or
> > 
> > typedef enum {
> >     VIR_OBJECT_LIST_TABLE_NAME = (1 << 0),
> >     VIR_OBJECT_LIST_TABLE_UUID = (1 << 1),
> > } virObjectListTableType;
> > 
> > virObjectListNew(int tableSize,
> >                  unsigned int flags);
> > 
> > I don't see why the virObjectLookupHashNew() has to have the @klass
> > parameter and the @tableElemsStart is not correct, it's not the number
> > of elements, it's has table size.
> > 
> 
> Well @klass is a parameter for other virObject*New() functions, so why
> should virObjectLookupHashNew not have one?  Or asked differently how
> would virObjectListNew determine it was derived from some class? Does

That's the difference, the proposed VirObjectList is a final class and
currently there is no need to derive any other class from it.

> that leave the hash tables in the calling vir*obj.c files or in some
> common file, such as vircommonobjlist.c? To me that's just a shim to
> virObject for each of the vir*obj.c. I don't like that solution - we

All of the code for virObjectList would be for example in
virobjectlist.c|h files, it would be a separate module that uses
virObject as a building block and also is tailored to contain virObject.

> shouldn't create a shim that doesn't converge objects, we would take the
> plunge and create object code. I also don't like ObjectList if I'm using
> a hash table - it's not a representative name. Perhaps I'm missing
> something else you had in mind.

I chose the "list" because that's what the API does, it's storing a data
addressed by key without any specific order, usually this type of
data structure is "map" or "dictionary", so yes, the "list" is not the
perfect description, we can use "map" or "dict", but "hash" is one of
the possible implementation of this data structure.

> @tableElemsStart is the starting size of the hash table that can grow
> beyond the starting size. Whether it changes names to @tableSize is just
> window dressing to hide the bikes in the shed. I think the reason why I
> used a variable was to allow different sizes for different ObjLists.

Yes, the variable should be there, you should be able to specify the
size of hash table, just the name was confusing.

> There may have been something else but that's long since left the short
> term memory. It might be for the clone or backup copy of an ObjList and
> wanted to have the original table size in order to create the clone.
> Doesn't seem I use it now though, so saving it could be dropped. Heck we
> could drop the initial size logic too and give virHashGrow some
> exercise. Although I see that has some interesting limitations w/r/t
> bucket chain length and larger element count tables where a max of 2048
> buckets can be created allowing bucket chains to potentially grow really
> long. Perhaps not a problem today with test beds using 100's of objects,
> but IIRC there was a storage example using 1000's of volumes if not 10's
> of 1000's. Once you get to about 15K objs, then the probability of
> longer chains increases (I played around with virhashtest a bit - which
> currently only maxes at 250 objs, yawn).
> 
> FWIW: I went away from @useName or @useBool type logic because some
> tables have UUID only, some have Name only, others have both. For those
> with one - there is a waste of space, but if we start with table sizes
> of 1, it's minimized. Again, I preferred @primary and @secondary, but
> that idea was already squashed. In any case, here's a crudely drawn
> table of UUID/Name and the various vir*obj consumers.
> 
>             UUID    |  Name
> ____________________|___________
> NodeDev     No      |  Yes
> ____________________|___________
> Secret      Yes     |  No [1]
> ____________________|___________
> NWFilter    Maybe[2]|  Yes
> ____________________|___________
> Interface   No      |  Yes
> ____________________|___________
> Network     Yes     |  Yes
> ____________________|___________
> StoragePool Yes     |  Yes
> ____________________|___________
> StorageVol  No      |  Yes
> ____________________|___________
> Domain      Yes     |  Yes
> ____________________|___________
> 
> [1] I do have patches that would make the secret usage_id be a secondary key
> 
> [2] Awful piece of history that allows the nwfilter to be defined
> without a UUID if saving the UUID to the file fails, thus allowing a
> future start to overwrite the UUID of a previous nwfilter.
> 
> There's also the possibility that Snapshots could use this, but I don't
> think it's as beneficial since "unpredictable ordering" for HashTables
> could make things AFU'd and I don't suppose you'd end up with 1000's of
> snapshots.
> 
> > 
> > ==== virObjectLookupHash(Add|Remove) ====
> > 
> > virObjectLookupHashAdd(void *tableobj,
> >                        virObjectLookupKeysPtr obj);
> > 
> > virObjectLookupHashRemove(void *tableobj,
> >                           virObjectLookupKeysPtr obj);
> > 
> > The only difference without the virObjectLookupKeys object would be
> > that the API will take two more parameters to give it the @uuid and
> > @name.
> > 
> > virObjectListAdd(virObjectListPtr listObj,
> >                  void *obj,
> >                  const char *name,
> >                  const char *uuid);
> > 
> > virObjectListRemove(virObjectListPtr listObj,
> >                     const char *name,
> >                     const char *uuid);
> > 
> > or
> > 
> > virObjectListRemoveByName(virObjectListPtr listObj,
> >                           const char *name);
> > virObjectListRemoveByUUID(virObjectListPtr listObj,
> >                           const char *uuid);
> > 
> > Yes, at first it may looks worse, but on the other hand it gives you
> > better flexibility in the way that the object added/removed from the
> > list doesn't even have to have the exact same variables and it will
> > work with every object that is derived from virObject.
> > 
> 
> Ahhh.. and this is where our fundamental difference lies. You see
> obj->uuid and obj->name as unnecessary and I find them useful especially
> for the purpose of adding/removing @obj to/from the @listObj.

Agree, for adding/removing you need all keys for all tables, having a
common object with predefined keys makes it easier but also adds a
limitations that you can use the list only for these objects.

> It seems you'd rather see virObject* API's (or perhaps some new
> vircommonobjlist.c) that force the callers to know which call to use
> (ByName/ByUUID) and which table's were created for the object. I prefer
> having a common API that doesn't need to be that specific, but does the
> thing just using an argument.

Yes, that's exactly what I'm proposing, having a separate module, it's
not even an virObject* API.

> Your characterization of "flexible" is my characterization of "control".
> It's not like UUID/Name are going away - they're just being managed in a
> lower layer. My view is that the @obj as created during vir*ObjNew is
> what decides which tables it will need to be present in. The opposing
> view is I have an object, I'm going to place it in this table and
> possibly that table and I know where I want to put it and will manage
> that from the calling function entirely.
> 
> One thing I would like to see changed is *Remove returning status. Not a
> a lot of good happens if something during the Remove processing fails
> and no one is told.
> 
> > 
> > ==== virObjectLookupHashFind ====
> > 
> > virObjectLookupHashFind(void *tableobj,
> >                         bool useUUID,
> >                         const char *key);
> > 
> > could be split into two APIs:
> > 
> > virObjectListFindByName(virObjectListPtr listObj,
> >                         const char *name);
> > virObjectListFindByUUID(virObjectListPtr listObj,
> >                         const char *UUID);
> > 
> > Which in my opinion is way better than having a single API with a
> > boolean parameter that switches what type of key it is.
> > 
> 
> Six of one, half-dozen of the other. The consumer is still going to call
> one or the other ByName/ByUUID, but that leaves that decision point
> higher up. It's easy enough to make ByName and ByUUID API's, but they
> just turn into shims to the common code - so why bother?

Because a bool parameter is not extendable, imagine that we would like
to add another key in future, it would force us to rewrite all the APIs.

> I still prefer the concept of @primary and @secondary. Consider the
> table above - should it matter that a table stores UUID's? or Names? or
> just one or two keys?  We made the decision early on during ObjNew what
> our keys would be.  After that it's up to calling function to provide
> some key that we'll search on. For some consumers which use both tables,
> they want to Find in a specific table and that's why you'd have a
> boolean to say - don't search primary, search secondary.
> 
> How about a different idea. Why not make the hash table count variable
> (1, 2, 3, etc.) and let the caller decide how many to create and which

I had the same idea, however for some object it doesn't give us any
benefit to have 4 or 5 hash tables for different keys because the number
of object stored in the tables would be small.  We have to think
about balance between speed and memory consumption.

> to use. That way the nodedev code removes the 4 or 5 FindBy type API's
> and replaces it with 4 or 5 hash tables to allow for lookup by key. The
> consumer can then have vir*ObjListFind{ByUUID|ByName}, but that ends up
> being a virObjectLookupHashFind(tableobj, tableidx, const char *key)
> where the consumer knows tableidx = 0 is UUID, tableidx = 1 is Name, etc.
> 
> > 
> > ==== virObjectLookupHashForEach ====
> > 
> > virObjectLookupHashForEach(void *tableobj,
> >                            bool useUUID,
> >                            virHashIterator callback,
> >                            virObjectLookupHashForEachDataPtr data);
> > 
> > could be
> > 
> > virObjectListForEach(virObjectListPtr listObj,
> >                      virHashIterator callback,
> >                      void *callbackData);
> > 
> > There are two things that I don't like.  There is no need to have the
> > @useUUID because both tables should contain the same objects if both are
> > used so this can be hidden from the user and the API will simply use one
> > of the available tables.  The second issue is that this API forces you
> > to use some predefined data structure and you cannot create your own
> > specifically tailored for the task that you are about to do.
> > 
> > The @useUUID argument also applies to virObjectLookupHashSearch().
> > 
> 
> [NB: written before rereading and writing the above paragraph regarding
> configurable number of hash tables]
> 
> How do you choose which table? How do you know which data the caller
> wants? Refer to the table above - some objects have UUID and some have
> Name. Some consumers may even want to return a list of all objects by
> UUID using an input parameter and other by Name with a different input
> parameter. The *ForEach ends up being a multi-purpose backend for NumOf,
> Get{Name|UUID}, and Export functions. Typically the Get{Name|UUID} is
> the "primary key" for the object, but I'm just saying...
>
> While I'm typing this response I just realized that the Get{Names|UUIDs}
> type API functions wouldn't need vir*obj.c specific *Callback functions
> to deref obj->def->{name|uuid} since obj->{uuid|name} exists! I'd still
> need some function to determine whether @aclfilter was passed or not for
> @def and I'd to know to @useUUID or not. But I could converge things
> even more than I have.
> 
> > 
> > 
> > 
> > The usage of the virObjectList* APIs could be something like this:
> > 
> > virObjectListPtr
> > virInterfaceObjListNew(void) {
> >     return virObjectListNew(10, VIR_OBJECT_LIST_TABLE_NAME);
> > }
> > 
> > static virInterfaceObjPtr
> > virInterfaceObjListFindByName(virObjectListPtr interfaces,
> >                               const char *name)
> > {
> >     return virObjectListFindByName(interfaces, name);
> > }
> > 
> > Pavel
> > 
> 
> And here's a perfect example of an object that only lives in one table.
> If I call virObjectLookupHashForEach or virObjectLookupHashSearch and
> expect to find something in objsUUID, I'm going to be sorely
> disappointed. One solution is to have two API's - my solution is one API
> with a parameter that dictates.

That's the thing, for APIs ForEach or Search you need to use only one
hash table.  If there is only one, it's easy to pick the correct one, if
there are two or more tables, they should contain the same objects only
mapped by different keys so it doesn't matter which table is used.

Pavel

> Given the amount of change since this series was first posted, I have to
> deal with merge conflicts, the virHashSearch new argument, and using RW
> locks for all the objlists. So I'm going to have to repost the patches
> again anyway.  At this time, I prefer to leave them as is for the
> reasons stated regarding avoiding a shim. If that's the avenue to take,
> then I'll let someone else go down that path. I can finish up cleanup
> and conversion of all the vir*obj code to use vir*Obj and vir*ObjList,
> but converging to a shim is something I'll avoid.
> 
> 
> John
> 
> --
> libvir-list mailing list
> libvir-list@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/libvir-list
Attachment:
signature.asc

Description: Digital signature
--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list