[IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



You do not need to read this. Really, stop now.

This is quick hack I wrote just before leaving work to show that I
could indeed push patches to our main repository starting with
31337. Names hidden to protect the innocent.

With a quick call to getenv() i'll be easy to supply a prefix with an
environment variable, so you can have d00dbabe, deadbeef, b00b
etc. commits as time allows.

    < cow-orker-1> avar: what do I have to do to have all my
                            commit ids begin with 31337?
    < cow-orker-2> cow-orker-1: just figure out SHA1 collision
    < cow-orker-2> well with MD5 it is "easy" to have
                   same-prefix collisions, so you can
                   append junk that will be ignored or
                   have no effect, and have the same MD5
                   as the shorter version
    <@avar> It actually wouldn't be too hard to make all your commits
begin with 31337
    <@avar> I might patch my git client to do that
    <@avar> Basically the sha1 is a function of the commit object /
message / top-level tree
    <@avar> and any compliant git client ignores headers it doesn't know about
    <@avar> so you can just add a header "lulz %d"
    <@avar> where %d is a number you keep incrementing until your
resulting sha1 happens to begin with 31337
    <@avar> 31337 is a 5 character hex string, and 16^5/2 = 524288
    <@avar> so you'd on average have to generate half a million
commits before you'd get the desired results
    <@avar> which would slow down git somewhat at commit time, but not
much more than using svn :)
    < cow-orker-2> enjoy your 3MB commits
    < cow-orker-2> ah you can do that on the same header
    <@avar> yes
    < cow-orker-2> thanks, you just ruined my evening
    [...]
    <@avar> Also you don't have to waste your evening, I've implemented this
    < cow-orker-1> \o/
    < cow-orker-3> hurry up and make a commit, already
    < cow-orker-3> [img-mj-popcorn.gif]
    <@avar> victory is mine: 313375d995e6f8b7773c6ed1ee165e5a9e15690b !

This is how you use it:

    $ time ~/g/git/git-commit -F /tmp/commit-message
    Try 0/4000000 to get a 1337 commit =
33d86a5a13ce07914a38ead3a517e391df0cc8c2
    Try 100000/4000000 to get a 1337 commit =
058e7663f54a3dd85e2c5a88cebf221ff7c25889
    Try 200000/4000000 to get a 1337 commit =
78839302cf449d088db1df5eb81f9116cbce55d0
    Try 300000/4000000 to get a 1337 commit =
f0af0cbe91d0ba8903085e2f867246095e6fb957
    Try 400000/4000000 to get a 1337 commit =
2068605f90321558e97ae5a083f63475ae2075ea
    Try 500000/4000000 to get a 1337 commit =
f8b4e2acd14ab111fd7957956368d181596bc6ef
    Try 600000/4000000 to get a 1337 commit =
3a479ab83d969f9b5638481445506036d1e1db46
    Try 700000/4000000 to get a 1337 commit =
29b98585b44c02e59240c1d8a1956ae434b3543f
    Try 800000/4000000 to get a 1337 commit =
8749e31c40200554b3758313cada1a3f596b0230
    Try 900000/4000000 to get a 1337 commit =
3fb2e617db95db8c027fba686c3f75eaa1b7f880
    commit id = 313375d995e6f8b7773c6ed1ee165e5a9e15690b
    [trunk 313375d] <censored>
     <censored>

    real    1m15.389s
    user    0m45.969s
    sys     0m29.368s

Which in just over a minute will generate, in my case:

    $ git show --pretty=raw 313375d995e6f8b7773c6ed1ee165e5a9e15690b | head -n 7
    commit 313375d995e6f8b7773c6ed1ee165e5a9e15690b
    tree c9bebc99c05dfe61cccf02ebdf442945c8ff8b3c
    parent 0dce2d45a79d26a593f0e12301cdfeb7eb23c17a
    author Ævar Arnfjörð Bjarmason <avar@xxxxxxxxxxx> <censored> <censored>
    committer Ævar Arnfjörð Bjarmason <avar@xxxxxxxxxxx> <censored> <censored>
    lulz 697889

And this is the evil one-off code for this:

    diff --git a/commit.c b/commit.c
    index 73b7e00..71d1605 100644
    --- a/commit.c
    +++ b/commit.c
    @@ -853,2 +853,4 @@ int commit_tree(const char *msg, unsigned char *tree,
            int encoding_is_utf8;
    +       int try;
    +       int tries = 4000000;
            struct strbuf buffer;
    @@ -857,2 +859,5 @@ int commit_tree(const char *msg, unsigned char *tree,

    +       struct commit_list *parents_ptr = parents;
    +
    +       for (try = 0; try < tries; try++) {
            /* Not having i18n.commitencoding is the same as having utf-8 */
    @@ -872,3 +877,2 @@ int commit_tree(const char *msg, unsigned char *tree,
                            sha1_to_hex(parents->item->object.sha1));
    -               free(parents);
                    parents = next;
    @@ -876,2 +880,4 @@ int commit_tree(const char *msg, unsigned char *tree,

    +               parents = parents_ptr;
    +
            /* Person/date information */
    @@ -883,2 +889,4 @@ int commit_tree(const char *msg, unsigned char *tree,
                    strbuf_addf(&buffer, "encoding %s\n", git_commit_encoding);
    +
    +               strbuf_addf(&buffer, "lulz %d\n", try);
            strbuf_addch(&buffer, '\n');
    @@ -893,3 +901,21 @@ int commit_tree(const char *msg, unsigned char *tree,
            result = write_sha1_file(buffer.buf, buffer.len, commit_type, ret);
    +
    +               if (result) {
    +                       die("failed to write commit object");
    +               } else {
    +                       if (strncmp(sha1_to_hex(ret), "31337", 5) == 0) {
    +                               printf("commit id = %s\n",
sha1_to_hex(ret));
    +                               goto done;
    +                       } else {
    +                               if (try % 100000 == 0) {
    +                                       fprintf(stderr, "Try %d/%d
to get a 1337 commit = %s\n", try, tries, sha1_to_hex(ret));
    +                               }
    +                               unlink_or_warn(sha1_file_name(ret));
    +                       }
            strbuf_release(&buffer);
    +               }
    +       }
    +       done:
    +       strbuf_release(&buffer);
    +
            return result;

There's nothing new or novel about this. I just thought it would be
nice to share it. Someone asked me if having a "lulz" header wouldn't
break things, but since we introduced the "encoding" header a while
back clients have learned to ignore unknown headers, so it doesn't.

Which is why we can discuss e.g. adding GPG headers without worrying
about breaking everything.

I also think it's interesting that we seemingly don't have (in my
brief search, maybe I missed it) an API for writing a complete commit
object into a strbuf, so I had to write a lot of commit objects to
disk, and keep unlinking the unacceptable candidates so the repository
wouldn't balloon in size.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]