Re: Suggestion on hashing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



When I went through the code, I noted that SHA-1 hashes are
currently used for the following:

   * object IDs
   * authentication (something to sign using public-key encryption)
   * data integrity (basically a really good checksum).

While there are lot of 20-byte arrays of unsigned char, many of those
are associated with lookups.  You might want to look at the
number of places that git_SHA1_Init is called (there aren't all that
many of those, and that function indicates the points where SHA-1
hashes are being created).

While a few things I tried were complete false starts (kept those
out of the preliminary patches I sent), I managed to store
a CRC (which you can treat as a place-holder for a real message
digest) for each SHA-1 hash in a pack file, but I did it by
creating a separate file (extension ".mds") and that worked.
I looked into modifying pack files, and that was too messy given
that you'd want older version to still work with newer remote
repositories.  The other factor is that the "mds" files are
computed locally, and at the same time that you create an "idx" file.
The formats of the "pack" and "idx" files don't change.

I've just started on replacing the CRC I used with real message
digests, making new digests easy to add. The plan is to initially
make it work with both a CRC and SHA-1 (the CRC so I can test it
easily by comparing new and old versions to show that nothing
changed when it shouldn't have), and because Git already implements
SHA-1.

I should complete my changes.  If we are lucky, maybe the changes I'm
trying would solve some of the problems you mentioned with pack files.
At least I can store the digests in a way that doesn't break the log
and fsck operations (it went through all the test suites, with only
minor modifications for things like counting the number of files in
particular directories).

If you make changes to commit objects, fixing the test scripts is a 
pain - there are a number of places where SHA-1 values are hard-
coded, and those have to be replaced.

Bill

On Tue, 2011-12-06 at 01:56 +0000, Chris West (Faux) wrote:
> Nguyen Thai Ngoc Duy wrote:
> > SHA-1 charateristics (like 20 byte length) are hard coded everywhere
> > in git, it'd be a big audit.
> 
> I was planning to look at this anyway.  My branch[1] allows
>   init/add/commit with SHA-256, SHA-512 and all the SHA-3 candidates.
> 
> log/fsck/etc. are all broken.  Don't even dare try packs.  Fixing things
>   is painful but not impossible.  I'm not convinced the task is even
>   remotely insurmountable.
> 
> (This is not a request-for-comments, just an informational notification.
>   It does not even attempt to address compatability or the like.)


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]