Re: [PATCH 10/13] util: hash: Reimplement virHashTable using GHashTable

Daniel P. Berrangé <berrange@xxxxxxxxxx> · Tue, 27 Oct 2020 12:11:33 +0000

On Tue, Oct 27, 2020 at 01:09:38PM +0100, Peter Krempa wrote:
> On Tue, Oct 27, 2020 at 10:04:33 +0000, Daniel Berrange wrote:
> > On Tue, Oct 27, 2020 at 10:53:12AM +0100, Peter Krempa wrote:
> > > On Mon, Oct 26, 2020 at 16:08:34 +0000, Daniel Berrange wrote:
> > > > On Mon, Oct 26, 2020 at 04:45:50PM +0100, Peter Krempa wrote:
> > > > > Glib's hash table provides basically the same functionality as our hash
> > > > > table.
> > > > > 
> > > > > In most cases the only thing that remains in the virHash* wrappers is
> > > > > NULL-checks of '@table' argument as glib's hash functions don't tolerate
> > > > > NULL.
> > > > > 
> > > > > In case of iterators, we adapt the existing API of iterators to glibs to
> > > > > prevent having rewrite all callers at this point.
> > > > > 
> > > > > Signed-off-by: Peter Krempa <pkrempa@xxxxxxxxxx>
> > > > > ---
> > > > >  src/libvirt_private.syms |   4 -
> > > > >  src/util/meson.build     |   1 -
> > > > >  src/util/virhash.c       | 416 ++++++++++-----------------------------
> > > > >  src/util/virhash.h       |   4 +-
> > > > >  src/util/virhashcode.c   | 125 ------------
> > > > >  src/util/virhashcode.h   |  33 ----
> > > > 
> > > > Our hash code impl uses Murmurhash which makes some efforts to be
> > > > robust against malicious inputs triggering collisons, notably using
> > > > a random seed.
> > > > 
> > > > The new code uses  g_str_hash which is much weaker, and the API
> > > > docs explicitly recommend against using it if the input can be from
> > > > an untrusted user.
> > > 
> > > Yes, I've noticed that, but didn't consider it to be that much of a
> > > problem as any untrusted input which is stored in a hash table (so that
> > > the attacker can use crafted keys) must be in the first place
> > > safeguarded against OOM condition by limiting the input count/size.
> > 
> > The problem isn't OOM, rather it is algorithmic complexity. With malicious
> > hash collisions the runtime lookup performance degrades to O(n) which can
> > cause scalability concerns in some cases.
> 
> I was pointing out that limiting the input size needed for OOM limit
> conveniently limits the size of 'n'.
> 
> The worst case for a malicious actor that I can see is the block device
> statistics code, where the worst case input would be based on 2 * 10 MiB
> of json, where based on 200 bytes per entry you could achieve 100k hash
> comparisons.
> 
> As noted though, I think we can use the better hash function we have.
> 
> The only difference will be probably that the seed will be global and
> not per-table since glibs table doesn't support that. If that's not
> acceptable we need to keep all the code since glibs hash table's hash
> function prototype is:

A one-time seed is likely fine.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|