[Last-Call] Re: last call comments on FNV

Mukund Sivaraman <muks@xxxxxxxxxx> · Tue, 15 Oct 2024 13:46:31 +0800

Dear Mr. Eastlake,

My minor review of the current draft:

[snip]

>                 The FNV Non-Cryptographic Hash Algorithm
>                          draft-eastlake-fnv-29

> Abstract

>    FNV (Fowler/Noll/Vo) is a fast, non-cryptographic hash algorithm with
>    good dispersion that is referenced in a number of standards documents

[snip]

> 1.  Introduction

>    FNV hashes are designed to be fast while maintaining a low collision
>    rate.  Their exceptional dispersion makes them particularly well-
>    suited for hashing nearly identical strings, including URLs,
>    hostnames, filenames, text, and IP addresses.  Their speed allows one
>    to quickly hash lots of data while maintaining a reasonably low
>    collision rate.

In the above, marketing words are used to describe this function such as
"good" and "exceptional" dispersion, "low collision rate," etc. But
there doesn't appear to be any analysis presented to justify these,
either in the draft or in the references.

As this is the draft about FNV that is seeking to become an RFC, if a
reader in the future wonders "Why are the authors saying these things?"
he would want to read an analysis. I suggest including results of
fitness tests that justify the words "exceptional dispersion."

[snip]

> 7.1.  Why is FNV Non-Cryptographic?

>    A full discussion of cryptographic hash requirements and strength is
>    beyond the scope of this document.  However, here are three
>    characteristics of FNV that would generally be considered to make it
>    non-cryptographic:

>    1.  Sticky State - A cryptographic hash should not have a state in
>        which it can stick for a plausible input pattern.  But, in the
>        very unlikely event that the FNV hash variable becomes zero and
>        the input is a sequence of zeros, the hash variable will remain
>        at zero until there is a non-zero input byte and the final hash
>        value will be unaffected by the length of that sequence of zero
>        input bytes.  Of course, for the common case of fixed length
>        input, this would usually not be significant because the number
>        of non-zero bytes would vary inversely with the number of zero
>        bytes and for some types of input, runs of zeros do not occur.
>        Furthermore, the use of a different offset_basis or the inclusion
>        of even a little unpredictable input may be sufficient to stop an
>        adversary from inducing a zero hash variable.

I suggest including pseudo-code on how inducing a zero hash state can be
prevented by including unpredictable input on a per-input basis. Is it
the same unpredicable input to be added for all inputs, or does it have
to be changed per input? The above text appears to treat it casually,
i.e., less than rigorously.

[snip]

> 7.2.  Inducing Collisions

>    While use of a cryptographic hash should be considered when active
>    adversaries are a factor, the following attack can be made much more
>    difficult with very minor changes in the use of FNV:

>    If FNV is being used in a known way for hash tables in a network
>    server or the like, for example some part of a web server, an
>    adversary could send requests calculated to cause hash table
>    collisions and induce substantial processing delays.  As mentioned in
>    Section 2.2, use of an offset_basis not known by the adversary will
>    substantially reduce or eliminate this problem.

This may not be the only problem.

If a service has a collection (e.g., a cache) that uses a hashtable
internally with 1<<N buckets, it may provide a way to inspect the
collection such as by allowing it to be dumped or browsed. The
implementation may not sort the collection when dumping or presenting
from a hashtable, and it would be naively dumped in the order it occurs
in the hash table.

If such data can be inspected by an adversary, it may be possible to
recover the 32-bit or 64-bit offset basis from using the order of keys
in such a dump.

For general purpose hashtable implementations (and in an off-the-shelf
library that a programmer would use as a black-box) where an adversary
can choose the keys that are inserted and later searched, this hash
function has weaknesses that the programmer would have to consider and
workaround, and for that reason it's generally unsuitable for use in a
hashtable esp. within a public facing network service.

I suggest covering the above in the section.

This draft's abstract states that its purpose is documenting this
function, for which it is fine.

Even for non-cryptographic use I suggest noting what it is not good for
in "1.1 FNV Hash Uses."

The phrase "The FNV hash is widely used" comes across as typical
marketing text to sell something. When one reads it, one may think "my
peers use it, so it should be a good choice," i.e., its popularity
influences the reader aside from its qualities. Popularity is no
indication of quality, as can be observed in some things in the world
today. The statement may not remain true in the future. I understand the
authors did not add it as marketing text. It's best to leave such text
out from an internet draft.

		Mukund
Attachment:
signature.asc

Description: PGP signature
-- 
last-call mailing list -- last-call@xxxxxxxx
To unsubscribe send an email to last-call-leave@xxxxxxxx