GSOC on designing a faster index format

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

I'm a first year CS major (mostly proficient in C and Racket) at
UChicago looking at GSOC and GIT caught my interest, especially on the
idea of designing a faster index format.  As the GSOC page recommended
starting a discussion in the mailing thread, I thought I would do so
on this particular topic.

"Git is pretty slow when managing huge repositories in terms of files
in any given tree, as it needs to rewrite the index (in full) on
pretty much every operation. For example, even though logically git
add already_tracked_file only changes a single blob SHA-1 in the
index, Git will verify index correctness during loading and recompute
the new hash during writing over the whole index. It thus ends up
spending a large amount of time simply on hashing the index."

Doing anything slowly is never fun, especially so as scale increases.
Dealing with version control for some of my course work, I appreciate
the speed possible with smaller projects and imagine on larger
projects the time saved would be quite lucrative.  That was what
initially attracted me to this idea, beyond the general desire to
maximize efficiency.  However, writing over the entire index with
every operation seems like a strange way spend resources.  Verifying
correctness, it would seem, can generally be done without a rewrite
and isn't necessarily exhaustively necessary with every index edit if
efficiency is becoming of utmost concern.  This then, would seem to
indicate that hashing the index wouldn't always be necessary.  Is it
done then just for 100% security in index correctness?

I also appreciate the later mention of the importance of being as easy
to parse as possible, as I tend to find such coding an intellectually
engaging exercise and this represents a valid reason to invest
resources in such a pursuit.  Unfortunately, I'm not entirely familiar
with other git-reading programs.  Is there a specific set of such
programs I should look into?

Thank you,
Calvin Deutschbein
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]