Actually, screw that hash. I think we should use SipHash13 for 64-bit and HalfSipHash13 for 32-bit. SipHash is designed for strong hash tables (instead of City64/Murmur2/3 which can be cracked with math just like this one), used in the kernel and more, and is fast and secure. While SipHash13 isn't as secure, it is very fast and still strong enough. Basically, what I did was modified the hash functions to check for '\0' or '=' by comparing 4 bytes at a time with the strlen bit twiddling method from Hackers' Delight, which is also used by Musl and glibc in their strchrnul implementations. I was originally thinking xxHash, which outperforms by a lot, but xxHash reads 16 bytes at a time which is difficult to dynamically check without interrupting the flow) It is also a much larger hash. I provide an alternative method for reading that reads one byte at a time instead of reading chunks with memcpy because technically it does read out of bounds, but we can potentially give malloc a bit more space to be safe. Generally, though, it isn't a huge deal if it doesn't go out of the page boundary. I think I should, to be safe, check alignment and use byte by byte if unaligned to be safe, because reading out of bounds unaligned has a chance of crossing a boundary, however, it will likely impact performance a bit. The speed difference on x86_64 is actually quite reasonable. On average it is only about 40% slower than the add hash. (However, when compiled for i386 targeting Pentium 4, HalfSipHash is twice as slow, and SipHash is stupid slow as expected), and about the same as the one I just mentioned. Any speed offsets from the hash function will be countered by the better distribution from a better hash function. Both versions save about a second and a half on dash's configure script vs the add hash, but that is probably mostly due to the significantly larger table size. My benchmark read a file given line by line, then added up the sums of the hashes for each line 100 times. I attached my benchmark program. Define CHECK_FOR_EQUALS to add checks for '=' (for hash_var), and ONE_AT_A_TIME to only read one byte at a time instead of reading 4-8. I checked a few strchrnul implementations and clearly the check as you go is faster, despite kind of complicating the control flow. I am working on changing the implementations to be macro controlled, so we can keep things tidy. As for resizing, do you think we need that? I think first work on this and then consider it. Because simply doing this would make Dash faster and much more secure against DoS attacks. Side note: I just realized that I missed the third usage of the add hash in exec.c. On Fri, Mar 1, 2019 at 3:47 AM Adam Borowski <kilobyte@xxxxxxxxxx> wrote: > > On Thu, Feb 28, 2019 at 11:42:41PM -0500, Devin Hussey wrote: > > This is a new patch to follow up to my non-cryptographic hash patch > > that is a little higher quality and has a PoC DoS script. > > > 3. Seeding at startup. I try to read from /dev/urandom and fall > > back to using clock(). This serves as the seed for the hash functions, > > and makes things much less predictable so it can't just be shut down > > with a static script. > > On any modern platform, please use getentropy() instead. It does the same > thing as reading from /dev/urandom, yet is: > * faster (dash is usually preferred over bash because of speed, and a file > open is slow enough to be noticeable) > * immune to file exhaustion attacks > > > Meow! > -- > ⢀⣴⠾⠻⢶⣦⠀ > ⣾⠁⢠⠒⠀⣿⡁ > ⢿⡄⠘⠷⠚⠋⠀ Have you accepted Khorne as your lord and saviour? > ⠈⠳⣄⠀⠀⠀⠀