[Last-Call] Re: last call comments on FNV

Donald Eastlake <d3e3e3@xxxxxxxxx> · Tue, 15 Oct 2024 12:40:40 -0400

Hi Mukund,

Thanks for your comments below. I agree that some adjustments to the
wording in the draft would be beneficial.

Thanks,
Donald
===============================
 Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
 2386 Panoramic Circle, Apopka, FL 32703 USA
 d3e3e3@xxxxxxxxx

On Tue, Oct 15, 2024 at 1:46 AM Mukund Sivaraman <muks@xxxxxxxxxx> wrote:
>
> Dear Mr. Eastlake,
>
> My minor review of the current draft:
>
> [snip]
>
> >                 The FNV Non-Cryptographic Hash Algorithm
> >                          draft-eastlake-fnv-29
>
> > Abstract
>
> >    FNV (Fowler/Noll/Vo) is a fast, non-cryptographic hash algorithm with
> >    good dispersion that is referenced in a number of standards documents
>
> [snip]
>
> > 1.  Introduction
>
> >    FNV hashes are designed to be fast while maintaining a low collision
> >    rate.  Their exceptional dispersion makes them particularly well-
> >    suited for hashing nearly identical strings, including URLs,
> >    hostnames, filenames, text, and IP addresses.  Their speed allows one
> >    to quickly hash lots of data while maintaining a reasonably low
> >    collision rate.
>
> In the above, marketing words are used to describe this function such as
> "good" and "exceptional" dispersion, "low collision rate," etc. But
> there doesn't appear to be any analysis presented to justify these,
> either in the draft or in the references.
>
> As this is the draft about FNV that is seeking to become an RFC, if a
> reader in the future wonders "Why are the authors saying these things?"
> he would want to read an analysis. I suggest including results of
> fitness tests that justify the words "exceptional dispersion."
>
> [snip]
>
> > 7.1.  Why is FNV Non-Cryptographic?
>
> >    A full discussion of cryptographic hash requirements and strength is
> >    beyond the scope of this document.  However, here are three
> >    characteristics of FNV that would generally be considered to make it
> >    non-cryptographic:
>
> >    1.  Sticky State - A cryptographic hash should not have a state in
> >        which it can stick for a plausible input pattern.  But, in the
> >        very unlikely event that the FNV hash variable becomes zero and
> >        the input is a sequence of zeros, the hash variable will remain
> >        at zero until there is a non-zero input byte and the final hash
> >        value will be unaffected by the length of that sequence of zero
> >        input bytes.  Of course, for the common case of fixed length
> >        input, this would usually not be significant because the number
> >        of non-zero bytes would vary inversely with the number of zero
> >        bytes and for some types of input, runs of zeros do not occur.
> >        Furthermore, the use of a different offset_basis or the inclusion
> >        of even a little unpredictable input may be sufficient to stop an
> >        adversary from inducing a zero hash variable.
>
> I suggest including pseudo-code on how inducing a zero hash state can be
> prevented by including unpredictable input on a per-input basis. Is it
> the same unpredicable input to be added for all inputs, or does it have
> to be changed per input? The above text appears to treat it casually,
> i.e., less than rigorously.
>
> [snip]
>
> > 7.2.  Inducing Collisions
>
> >    While use of a cryptographic hash should be considered when active
> >    adversaries are a factor, the following attack can be made much more
> >    difficult with very minor changes in the use of FNV:
>
> >    If FNV is being used in a known way for hash tables in a network
> >    server or the like, for example some part of a web server, an
> >    adversary could send requests calculated to cause hash table
> >    collisions and induce substantial processing delays.  As mentioned in
> >    Section 2.2, use of an offset_basis not known by the adversary will
> >    substantially reduce or eliminate this problem.
>
> This may not be the only problem.
>
> If a service has a collection (e.g., a cache) that uses a hashtable
> internally with 1<<N buckets, it may provide a way to inspect the
> collection such as by allowing it to be dumped or browsed. The
> implementation may not sort the collection when dumping or presenting
> from a hashtable, and it would be naively dumped in the order it occurs
> in the hash table.
>
> If such data can be inspected by an adversary, it may be possible to
> recover the 32-bit or 64-bit offset basis from using the order of keys
> in such a dump.
>
> For general purpose hashtable implementations (and in an off-the-shelf
> library that a programmer would use as a black-box) where an adversary
> can choose the keys that are inserted and later searched, this hash
> function has weaknesses that the programmer would have to consider and
> workaround, and for that reason it's generally unsuitable for use in a
> hashtable esp. within a public facing network service.
>
> I suggest covering the above in the section.
>
> This draft's abstract states that its purpose is documenting this
> function, for which it is fine.
>
> Even for non-cryptographic use I suggest noting what it is not good for
> in "1.1 FNV Hash Uses."
>
> The phrase "The FNV hash is widely used" comes across as typical
> marketing text to sell something. When one reads it, one may think "my
> peers use it, so it should be a good choice," i.e., its popularity
> influences the reader aside from its qualities. Popularity is no
> indication of quality, as can be observed in some things in the world
> today. The statement may not remain true in the future. I understand the
> authors did not add it as marketing text. It's best to leave such text
> out from an internet draft.
>
>                 Mukund

-- 
last-call mailing list -- last-call@xxxxxxxx
To unsubscribe send an email to last-call-leave@xxxxxxxx