Typesafer git hash patch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



So because of the whole SHA1 discussion, I started looking at what it
would involve to turn

    unsigned char *sha1

style arguments (and various structure members) in the git source into

   typedef struct { ... } hash_t;

   hash_t *hash;

The answer is that it's pretty painful - more so than I expected (I
looked at it _looong_ ago and decided it was painful - and it has
become more painful as git has grown).

But once I got started I just continued through the slog.

Having the hashes be more encapsulated does seem to make things better
in many ways. What I did was to also just unify the notion of "hash_t"
and "struct object_id", so the two are entirely interchangeable. That
actually can clean up some code, because right now we have duplicate
interfaces for some things that take an oid pointer and others take a
"const unsigned char *sha1", and that duplication essentially can go
away. I didn't do any of that, I tried to keep it as as brute-force
stupid conversion.

I saw that somebody is actually looking at doing this "well" - by
doing it in many smaller steps. I tried. I gave up. The "unsigned char
*sha1" model is so ingrained that doing it incrementally just seemed
like a lot of pain. But it might be the right approach.

It might particularly be the right approach considering the size of the patch:

 216 files changed, 4174 insertions(+), 4080 deletions(-)

which is pretty nasty. The good news is that my patch passes all the
tests, and while it's big it's mostly very very mindless, and a lot of
it looks like cleanups, and the lines are generally shorter, eg

-               const unsigned char *mb = result->item->object.oid.hash;
-               if (!hashcmp(mb, current_bad_oid->hash)) {

turns into

+               const hash_t *mb = &result->item->object.oid;
+               if (!hashcmp(mb, current_bad_oid)) {

but I ended up also renaming a lot of common "sha1" as "hash", which
adds to the noise in that patch.

NOTE! It doesn't actually _fix_ the SHA1-centricity in any way, but it
makes it a bit more obvious where the bigger problems are. Not that
anybody would be surprised by what they are, but as part of writing
the patch it did kind of pinpoint most of them, and about 30 of those
new lines are added

 /* FIXME! Hardcoded hash sizes */
 /* FIXME! Lots of fixed-size hashes */
 /* FIXME! Fixed 20-byte hash usage */

with the rest of the added lines being a few helper functions and
splitting cache.h up a bit.

And as part of the type safety, I do think I may have found a bug:

show_one_mergetag():

                strbuf_addf(&verify_message, "tag %s names a non-parent %s\n",
                                    tag->tag, tag->tagged->oid.hash);

note how it prints out the "non-parent %s", but that's a SHA1 hash
that hasn't been converted to hex. Hmm?

Anyway, is anybody interested?  I warn you - it really is one big
patch on top of 2.12.

                   Linus

PS. Just for fun, I tried to look what it does when you then merge pu
or next.. You do get a fair number of merge conflicts because there's
just a lot of changes all around, but they look manageable. So merging
something like that would be painful, but it appears to not entirely
break other development.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]