On Tue, Oct 27, 2020 at 01:09:38PM +0100, Peter Krempa wrote: > On Tue, Oct 27, 2020 at 10:04:33 +0000, Daniel Berrange wrote: > > On Tue, Oct 27, 2020 at 10:53:12AM +0100, Peter Krempa wrote: > > > On Mon, Oct 26, 2020 at 16:08:34 +0000, Daniel Berrange wrote: > > > > On Mon, Oct 26, 2020 at 04:45:50PM +0100, Peter Krempa wrote: > > > > > Glib's hash table provides basically the same functionality as our hash > > > > > table. > > > > > > > > > > In most cases the only thing that remains in the virHash* wrappers is > > > > > NULL-checks of '@table' argument as glib's hash functions don't tolerate > > > > > NULL. > > > > > > > > > > In case of iterators, we adapt the existing API of iterators to glibs to > > > > > prevent having rewrite all callers at this point. > > > > > > > > > > Signed-off-by: Peter Krempa <pkrempa@xxxxxxxxxx> > > > > > --- > > > > > src/libvirt_private.syms | 4 - > > > > > src/util/meson.build | 1 - > > > > > src/util/virhash.c | 416 ++++++++++----------------------------- > > > > > src/util/virhash.h | 4 +- > > > > > src/util/virhashcode.c | 125 ------------ > > > > > src/util/virhashcode.h | 33 ---- > > > > > > > > Our hash code impl uses Murmurhash which makes some efforts to be > > > > robust against malicious inputs triggering collisons, notably using > > > > a random seed. > > > > > > > > The new code uses g_str_hash which is much weaker, and the API > > > > docs explicitly recommend against using it if the input can be from > > > > an untrusted user. > > > > > > Yes, I've noticed that, but didn't consider it to be that much of a > > > problem as any untrusted input which is stored in a hash table (so that > > > the attacker can use crafted keys) must be in the first place > > > safeguarded against OOM condition by limiting the input count/size. > > > > The problem isn't OOM, rather it is algorithmic complexity. With malicious > > hash collisions the runtime lookup performance degrades to O(n) which can > > cause scalability concerns in some cases. > > I was pointing out that limiting the input size needed for OOM limit > conveniently limits the size of 'n'. > > The worst case for a malicious actor that I can see is the block device > statistics code, where the worst case input would be based on 2 * 10 MiB > of json, where based on 200 bytes per entry you could achieve 100k hash > comparisons. > > As noted though, I think we can use the better hash function we have. > > The only difference will be probably that the seed will be global and > not per-table since glibs table doesn't support that. If that's not > acceptable we need to keep all the code since glibs hash table's hash > function prototype is: A one-time seed is likely fine. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|