On Mon, Oct 26, 2020 at 16:08:34 +0000, Daniel Berrange wrote: > On Mon, Oct 26, 2020 at 04:45:50PM +0100, Peter Krempa wrote: > > Glib's hash table provides basically the same functionality as our hash > > table. > > > > In most cases the only thing that remains in the virHash* wrappers is > > NULL-checks of '@table' argument as glib's hash functions don't tolerate > > NULL. > > > > In case of iterators, we adapt the existing API of iterators to glibs to > > prevent having rewrite all callers at this point. > > > > Signed-off-by: Peter Krempa <pkrempa@xxxxxxxxxx> > > --- > > src/libvirt_private.syms | 4 - > > src/util/meson.build | 1 - > > src/util/virhash.c | 416 ++++++++++----------------------------- > > src/util/virhash.h | 4 +- > > src/util/virhashcode.c | 125 ------------ > > src/util/virhashcode.h | 33 ---- > > Our hash code impl uses Murmurhash which makes some efforts to be > robust against malicious inputs triggering collisons, notably using > a random seed. > > The new code uses g_str_hash which is much weaker, and the API > docs explicitly recommend against using it if the input can be from > an untrusted user. Yes, I've noticed that, but didn't consider it to be that much of a problem as any untrusted input which is stored in a hash table (so that the attacker can use crafted keys) must be in the first place safeguarded against OOM condition by limiting the input count/size. Apart from insertion we do have at least one place where lookup is based on untrusted data so there is a possibility for bigger scope of impact. Said that I don't really believe that the complexity vs impact ratio makes it worth even for a completely static hashing function, but when we can do better, we should. > IOW, I don't think we should be removing our virhashcode impl. If > anything we should upgrade our hash code impl to use SipHash which > is used by perl, python, ruby, rust and more. Okay, I'll leave virhashcode in. Regardless whether we have a "cool" hashing function or not, devs need to be vigilant of use of any data structure based on untrusted data though.