Re: [PATCH 10/13] util: hash: Reimplement virHashTable using GHashTable

Peter Krempa <pkrempa@xxxxxxxxxx> · Tue, 27 Oct 2020 13:09:38 +0100

On Tue, Oct 27, 2020 at 10:04:33 +0000, Daniel Berrange wrote:
> On Tue, Oct 27, 2020 at 10:53:12AM +0100, Peter Krempa wrote:
> > On Mon, Oct 26, 2020 at 16:08:34 +0000, Daniel Berrange wrote:
> > > On Mon, Oct 26, 2020 at 04:45:50PM +0100, Peter Krempa wrote:
> > > > Glib's hash table provides basically the same functionality as our hash
> > > > table.
> > > > 
> > > > In most cases the only thing that remains in the virHash* wrappers is
> > > > NULL-checks of '@table' argument as glib's hash functions don't tolerate
> > > > NULL.
> > > > 
> > > > In case of iterators, we adapt the existing API of iterators to glibs to
> > > > prevent having rewrite all callers at this point.
> > > > 
> > > > Signed-off-by: Peter Krempa <pkrempa@xxxxxxxxxx>
> > > > ---
> > > >  src/libvirt_private.syms |   4 -
> > > >  src/util/meson.build     |   1 -
> > > >  src/util/virhash.c       | 416 ++++++++++-----------------------------
> > > >  src/util/virhash.h       |   4 +-
> > > >  src/util/virhashcode.c   | 125 ------------
> > > >  src/util/virhashcode.h   |  33 ----
> > > 
> > > Our hash code impl uses Murmurhash which makes some efforts to be
> > > robust against malicious inputs triggering collisons, notably using
> > > a random seed.
> > > 
> > > The new code uses  g_str_hash which is much weaker, and the API
> > > docs explicitly recommend against using it if the input can be from
> > > an untrusted user.
> > 
> > Yes, I've noticed that, but didn't consider it to be that much of a
> > problem as any untrusted input which is stored in a hash table (so that
> > the attacker can use crafted keys) must be in the first place
> > safeguarded against OOM condition by limiting the input count/size.
> 
> The problem isn't OOM, rather it is algorithmic complexity. With malicious
> hash collisions the runtime lookup performance degrades to O(n) which can
> cause scalability concerns in some cases.

I was pointing out that limiting the input size needed for OOM limit
conveniently limits the size of 'n'.

The worst case for a malicious actor that I can see is the block device
statistics code, where the worst case input would be based on 2 * 10 MiB
of json, where based on 200 bytes per entry you could achieve 100k hash
comparisons.

As noted though, I think we can use the better hash function we have.

The only difference will be probably that the seed will be global and
not per-table since glibs table doesn't support that. If that's not
acceptable we need to keep all the code since glibs hash table's hash
function prototype is:

guint
(*GHashFunc) (gconstpointer key);