Re: [PATCH] Use a larger table, initial seeds, and a better hash function to prevent flooding

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



For a benchmark, I timed dash's configure script both cold and warm

    time dash dash/configure

This is all on my Core i7 on macOS.

cold:
old dash:
real 0m10.130s
user 0m5.071s
sys 0m3.795s
new dash:
real 0m9.090s
user 0m4.900s
sys 0m3.487s

warm:
old dash:
real 0m9.179s
user 0m5.078s
sys 0m3.392s
new dash:
real 0m8.746s
user 0m4.944s
sys 0m3.258s

GNU Nano's rather large (due to Gnulib) configure script, warm, quiet
output, cached with -C

    nano/configure -C
    time dash nano/configure -C -q >/dev/null

old dash:
real 0m4.758s
user 0m2.715s
sys 0m2.080s
new dash:
real 0m4.444s
user 0m2.599s
sys 0m2.022s

this_kills_dash.sh from earlier, cold

    time dash this_kills_dash.sh

old dash:
real 2m3.371s
user 2m3.038s
sys 0m0.298s
new dash:
real 0m0.841s
user 0m0.625s
sys 0m0.159s

The combination of the larger table and the better bucket distribution
actually improves performance a bit, and of course, there needs to be
brute force to flood it instead of a simple C++ program generator.

On Sun, Mar 3, 2019 at 7:01 PM Devin Hussey <husseydevin@xxxxxxxxx> wrote:
>
> Oops, it looks like my attachment disappeared. Anyways, here is a new
> version of the patch and the hash benchmark I meant to attach before
> (it has an older version of the modified SipHash)
>
> Changes:
>   - Added the SipHash. It is a header that is included twice, once for
> strings only, again for variable names. It is one function body with
> behavior modified by the preprocessor.
>     I go into details about the hash in the message. (I forgot to add
> that byte swapping is removed because who cares in a hash table?)
>   - I made a check for alignment and check byte by byte if we are unaligned.
>   - Changed exec.c to use SipHash as well.
>   - Used getentropy(), falling back to /dev/urandom, then mixing (with
> XXH32_round from xxHash) the time, cycle count, and seeded rand()
> calls as a last resort.
>   - I added getentropy and sys/random.h to configure.ac.
>   - Renamed hash_alias to hash_string.
>   - Use power of 2 hash tables because SipHash is good enough to warrant it.
>
> How is performance on your side? My computer and my phone are pretty
> old (Sandy Bridge and Cortex-A15), and I would like to know how this
> performs on other platforms
>
>
> On Sun, Mar 3, 2019 at 1:24 AM Devin Hussey <husseydevin@xxxxxxxxx> wrote:
> >
> > Actually, screw that hash. I think we should use SipHash13 for 64-bit
> > and HalfSipHash13 for 32-bit. SipHash is designed for strong hash
> > tables (instead of City64/Murmur2/3 which can be cracked with math
> > just like this one), used in the kernel and more, and is fast and
> > secure.
> >
> > While SipHash13 isn't as secure, it is very fast and still strong enough.
> >
> > Basically, what I did was modified the hash functions to check for
> > '\0' or '=' by comparing 4 bytes at a time with the strlen bit
> > twiddling method from Hackers' Delight, which is also used by Musl and
> > glibc in their strchrnul implementations. I was originally thinking
> > xxHash, which outperforms by a lot, but xxHash reads 16 bytes at a
> > time which is difficult to dynamically check without interrupting the
> > flow) It is also a much larger hash.
> >
> > I provide an alternative method for reading that reads one byte at a
> > time instead of reading chunks with memcpy because technically it does
> > read out of bounds, but we can potentially give malloc a bit more
> > space to be safe. Generally, though, it isn't a huge deal if it
> > doesn't go out of the page boundary. I think I should, to be safe,
> > check alignment and use byte by byte if unaligned to be safe, because
> > reading out of bounds unaligned has a chance of crossing a boundary,
> > however, it will likely impact performance a bit.
> >
> > The speed difference on x86_64 is actually quite reasonable. On
> > average it is only about 40% slower than the add hash.
> > (However, when compiled for i386 targeting Pentium 4, HalfSipHash is
> > twice as slow, and SipHash is stupid slow as expected), and
> > about the same as the one I just mentioned. Any speed offsets from the
> > hash function will be countered by the better
> > distribution from a better hash function. Both versions save about a
> > second and a half on dash's configure script vs the add hash, but that
> > is probably mostly due to the significantly larger table size.
> >
> > My benchmark read a file given line by line, then added up the sums of
> > the hashes
> > for each line 100 times. I attached my benchmark program. Define
> > CHECK_FOR_EQUALS to add checks for '=' (for hash_var), and
> > ONE_AT_A_TIME to only read
> > one byte at a time instead of reading 4-8.
> >
> > I checked a few strchrnul implementations and clearly the check as you
> > go is faster, despite kind of complicating the control flow.
> >
> > I am working on changing the implementations to be macro controlled,
> > so we can keep things tidy.
> >
> > As for resizing, do you think we need that? I think first work on this
> > and then consider it. Because simply doing this would make Dash faster
> > and much more secure against DoS attacks.
> >
> > Side note: I just realized that I missed the third usage of the add
> > hash in exec.c.
> >
> > On Fri, Mar 1, 2019 at 3:47 AM Adam Borowski <kilobyte@xxxxxxxxxx> wrote:
> > >
> > > On Thu, Feb 28, 2019 at 11:42:41PM -0500, Devin Hussey wrote:
> > > > This is a new patch to follow up to my non-cryptographic hash patch
> > > > that is a little higher quality and has a PoC DoS script.
> > >
> > > > 3.    Seeding at startup. I try to read from /dev/urandom and fall
> > > > back to using clock(). This serves as the seed for the hash functions,
> > > > and makes things much less predictable so it can't just be shut down
> > > > with a static script.
> > >
> > > On any modern platform, please use getentropy() instead.  It does the same
> > > thing as reading from /dev/urandom, yet is:
> > > * faster (dash is usually preferred over bash because of speed, and a file
> > >   open is slow enough to be noticeable)
> > > * immune to file exhaustion attacks
> > >
> > >
> > > Meow!
> > > --
> > > ⢀⣴⠾⠻⢶⣦⠀
> > > ⣾⠁⢠⠒⠀⣿⡁
> > > ⢿⡄⠘⠷⠚⠋⠀ Have you accepted Khorne as your lord and saviour?
> > > ⠈⠳⣄⠀⠀⠀⠀




[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux