Re: [RFC] mke2fs -E hash_alg=siphash: any interest?

Andreas Dilger <adilger@xxxxxxxxx> · Wed, 24 Sep 2014 00:25:53 +0200

On Sep 21, 2014, at 7:55 PM, Theodore Ts'o <tytso@xxxxxxx> wrote:
> On Sun, Sep 21, 2014 at 05:53:39AM -0400, George Spelvin wrote:
>> 
>> Basically, it offers security similar to teahash with a faster, and better studied, primitive designed specifically for this application.
>> 
>> I'm thinking of turning this into a patch for ext2utils and fs/ext4.
>> 
>> Could I ask what the general level of interest is?  On a scale of "hell,
>> no, not more support burden!" to "thank you, I've been meaning to find
>> time to add that!"
> 
> I'm certainly not against adding a new hash function.  The reality is
> that it would be quite a while before we could turn it on by default,
> because of the backwards compatibility concerns.
> 
> The question I would ask is whether we can show an anctual performance
> improvement with the hash being used in situ.  Let's give it the best
> possible chance of making a difference; let's assume a RAM disk with a
> very metadata intensive benchmark, with journalling turned off.  What
> sort of difference would we see, either in terms of system CPU time,
> wall clock time, etc.?
> 
> The results of such a benchmark would certainly make a difference in
> how aggressively we might try to phase in a new hash algorithm.

Now that the patches are available, it makes sense to run some
directory-intensive benchmark to see whether the improved hash
function actually shows improved performance.  The hash may be
somewhat faster, but since this is only hashing the filename and
not KB/MB of data, it isn't clear whether this is going to improve
observable performance of directory operations.

I'm not sure what a suitable benchmark for this is, however.  It
needs to be doing filename lookups to exercise the hashing, but
in the workloads that I can think of there is always a lot more
work after the name is looked up (e.g. open(), stat(), etc) on
the filename.  Some possibilities include "ls -l" or "mv A/* B/".
It may be the only way to see the difference is via oprofile.

It also isn't clear whether the strength of siphash is significantly
better than "halfmd4", which is already cryptographically-strong.
Since the filename hash is also a function of the filesystem-unique
s_hash_seed, mounting an "attack" on a directory needs to be specific
to a particular filesystem, and isn't portable to other filesystems.

Cheers, Andreas

Attachment:
signature.asc

Description: Message signed with OpenPGP using GPGMail