For a benchmark, I timed dash's configure script both cold and warm time dash dash/configure This is all on my Core i7 on macOS. cold: old dash: real 0m10.130s user 0m5.071s sys 0m3.795s new dash: real 0m9.090s user 0m4.900s sys 0m3.487s warm: old dash: real 0m9.179s user 0m5.078s sys 0m3.392s new dash: real 0m8.746s user 0m4.944s sys 0m3.258s GNU Nano's rather large (due to Gnulib) configure script, warm, quiet output, cached with -C nano/configure -C time dash nano/configure -C -q >/dev/null old dash: real 0m4.758s user 0m2.715s sys 0m2.080s new dash: real 0m4.444s user 0m2.599s sys 0m2.022s this_kills_dash.sh from earlier, cold time dash this_kills_dash.sh old dash: real 2m3.371s user 2m3.038s sys 0m0.298s new dash: real 0m0.841s user 0m0.625s sys 0m0.159s The combination of the larger table and the better bucket distribution actually improves performance a bit, and of course, there needs to be brute force to flood it instead of a simple C++ program generator. On Sun, Mar 3, 2019 at 7:01 PM Devin Hussey <husseydevin@xxxxxxxxx> wrote: > > Oops, it looks like my attachment disappeared. Anyways, here is a new > version of the patch and the hash benchmark I meant to attach before > (it has an older version of the modified SipHash) > > Changes: > - Added the SipHash. It is a header that is included twice, once for > strings only, again for variable names. It is one function body with > behavior modified by the preprocessor. > I go into details about the hash in the message. (I forgot to add > that byte swapping is removed because who cares in a hash table?) > - I made a check for alignment and check byte by byte if we are unaligned. > - Changed exec.c to use SipHash as well. > - Used getentropy(), falling back to /dev/urandom, then mixing (with > XXH32_round from xxHash) the time, cycle count, and seeded rand() > calls as a last resort. > - I added getentropy and sys/random.h to configure.ac. > - Renamed hash_alias to hash_string. > - Use power of 2 hash tables because SipHash is good enough to warrant it. > > How is performance on your side? My computer and my phone are pretty > old (Sandy Bridge and Cortex-A15), and I would like to know how this > performs on other platforms > > > On Sun, Mar 3, 2019 at 1:24 AM Devin Hussey <husseydevin@xxxxxxxxx> wrote: > > > > Actually, screw that hash. I think we should use SipHash13 for 64-bit > > and HalfSipHash13 for 32-bit. SipHash is designed for strong hash > > tables (instead of City64/Murmur2/3 which can be cracked with math > > just like this one), used in the kernel and more, and is fast and > > secure. > > > > While SipHash13 isn't as secure, it is very fast and still strong enough. > > > > Basically, what I did was modified the hash functions to check for > > '\0' or '=' by comparing 4 bytes at a time with the strlen bit > > twiddling method from Hackers' Delight, which is also used by Musl and > > glibc in their strchrnul implementations. I was originally thinking > > xxHash, which outperforms by a lot, but xxHash reads 16 bytes at a > > time which is difficult to dynamically check without interrupting the > > flow) It is also a much larger hash. > > > > I provide an alternative method for reading that reads one byte at a > > time instead of reading chunks with memcpy because technically it does > > read out of bounds, but we can potentially give malloc a bit more > > space to be safe. Generally, though, it isn't a huge deal if it > > doesn't go out of the page boundary. I think I should, to be safe, > > check alignment and use byte by byte if unaligned to be safe, because > > reading out of bounds unaligned has a chance of crossing a boundary, > > however, it will likely impact performance a bit. > > > > The speed difference on x86_64 is actually quite reasonable. On > > average it is only about 40% slower than the add hash. > > (However, when compiled for i386 targeting Pentium 4, HalfSipHash is > > twice as slow, and SipHash is stupid slow as expected), and > > about the same as the one I just mentioned. Any speed offsets from the > > hash function will be countered by the better > > distribution from a better hash function. Both versions save about a > > second and a half on dash's configure script vs the add hash, but that > > is probably mostly due to the significantly larger table size. > > > > My benchmark read a file given line by line, then added up the sums of > > the hashes > > for each line 100 times. I attached my benchmark program. Define > > CHECK_FOR_EQUALS to add checks for '=' (for hash_var), and > > ONE_AT_A_TIME to only read > > one byte at a time instead of reading 4-8. > > > > I checked a few strchrnul implementations and clearly the check as you > > go is faster, despite kind of complicating the control flow. > > > > I am working on changing the implementations to be macro controlled, > > so we can keep things tidy. > > > > As for resizing, do you think we need that? I think first work on this > > and then consider it. Because simply doing this would make Dash faster > > and much more secure against DoS attacks. > > > > Side note: I just realized that I missed the third usage of the add > > hash in exec.c. > > > > On Fri, Mar 1, 2019 at 3:47 AM Adam Borowski <kilobyte@xxxxxxxxxx> wrote: > > > > > > On Thu, Feb 28, 2019 at 11:42:41PM -0500, Devin Hussey wrote: > > > > This is a new patch to follow up to my non-cryptographic hash patch > > > > that is a little higher quality and has a PoC DoS script. > > > > > > > 3. Seeding at startup. I try to read from /dev/urandom and fall > > > > back to using clock(). This serves as the seed for the hash functions, > > > > and makes things much less predictable so it can't just be shut down > > > > with a static script. > > > > > > On any modern platform, please use getentropy() instead. It does the same > > > thing as reading from /dev/urandom, yet is: > > > * faster (dash is usually preferred over bash because of speed, and a file > > > open is slow enough to be noticeable) > > > * immune to file exhaustion attacks > > > > > > > > > Meow! > > > -- > > > ⢀⣴⠾⠻⢶⣦⠀ > > > ⣾⠁⢠⠒⠀⣿⡁ > > > ⢿⡄⠘⠷⠚⠋⠀ Have you accepted Khorne as your lord and saviour? > > > ⠈⠳⣄⠀⠀⠀⠀