Re: [libvirt] [PATCH 2/5] Convert virDomainObjListPtr to use a hash of domain objects

Daniel Veillard <veillard@xxxxxxxxxx> · Wed, 14 Oct 2009 17:27:03 +0200

On Wed, Oct 14, 2009 at 03:16:19PM +0100, Daniel P. Berrange wrote:
> On Wed, Oct 14, 2009 at 04:04:50PM +0200, Daniel Veillard wrote:
> > On Wed, Oct 14, 2009 at 11:48:51AM +0100, Daniel P. Berrange wrote:
> >   So we went from list to array, and now to hash table :-)
> 
> Yeah, I very much regret switching from the linked list to
> an array. I don't know why I didn't go straight to a hash
> table when I changed it last time :-(

  point is we switched everything (well nearly) to arrays,
but we are only changing domain lookup to hash table.

> > <nitpick>
> > I'm afraid the O(1) is an exageration since as the number grows
> > you would still need to lock each object on the clash list for
> > that hash. But for normal use we should end up in constant time,
> > but in theory I think it's still O(n) ...
> > </nitpick>
> 
> If that ever becomes a problem, which I very much doubt, then
> we can easily enhance the hash table impl to automatically add
> extra buckets once the number of elements passes some threshold
> so you guarentee its always very close to O(1)
> 
> The important change here is really removal of the O(n) mutex
> locks, whcih will always be O(1), because even if there are
> many entries in each hash bucket you're still doing the lookup
> based on the key, not the entry. So you'll only ever need to
> lock the object you eventually find.

  I was nitpicking on the formalism :-) Independantly of O(whatever)
I'm sure it's way faster to jump directly to the hash list.
But I'm afraid we need to check something, looking quickly at the
patch I don't see any locking of the hash table itself during the
lookups, maybe I missed it but there is 2 things I need to raise

  - the obvious that another thread may modify the hash table,
    I assume it's covered by the new R/W locks but just double
    checking
  - the less obvious point that the object hash can't be cached
    because the hash table code allows to grow as needed (up to
    some limit) that's inherited from libxml2 design where I wanted to
    keep small hashes for non-complex document. Since UUID are immutable
    we could drop that virHashGrow() code, go for the larger from the
    start, and avoid recomputing UUID strings and hash everytime
    by caching them in the domain object. Crude but might save some
    cycle.

> > I expect that the gain will come from the fact all lookups intenally
> > are done though the UUID, the Name/ID being only convenience API.
> > I'm also wondering about the symetry for other kind of objects,
> > we probably don't need this for networks (less objects, less
> > operation and operation usually short), you really expect that
> > to be a single optimization and just for domains, right ?
> 
> I think it will probably be worth doing it for storage pools too.
> And once we've done that we might as well do it for networks too
> just to be consistent everywhere, but its not critical for now.

  Yup, let's implement and evaluate for domains, then we will
see if consistency really matters.

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel@xxxxxxxxxxxx  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/

--
Libvir-list mailing list
Libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list