On Sat, Oct 28, 2017 at 11:12 AM, brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx> wrote: > This is a series proposing a basic abstraction for hash functions. > > As we get closer to converting the remainder of the codebase to use > struct object_id, we should think about the design we want our hash > function abstraction to take. This series is a proposal for one idea. > Input on any aspect of this proposal is welcome. I like the series/idea. > This series exposes a struct git_hash_algo that contains basic > information about a given hash algorithm that distinguishes it from > other algorithms: name, identifiers, lengths, implementing functions, > and empty tree and blob constants. It also exposes an array of hash > algorithms, and a constant for indexing them. > > The series also demonstrates a simple conversion using the abstraction > over empty blob and tree values. That looks good, though I wonder if we'll have to give a way a little performance during the transition period (or even indefinitely, as the hash abstraction won't go away; we'd rather want to keep it in place long term). I guess we can measure after all is said and done, no big deal. > > In order to avoid conflicting with the struct repository work and with > the goal of avoiding global variables as much as possible, I've pushed > the hash algorithm into struct repository and exposed it via a #define. If you are referring to that long series[1] that Jonathan and I were working on, then no worries. Unfortunately that series got some delays, but I rebased its prepared successors (of over 100 patches) on Friday and it was surprisingly easy to do so; just tedious. [1] https://public-inbox.org/git/20170830064634.GA153983@xxxxxxxxxxxxxxxxxxxxxxxxx/ > > I propose this series now as it will inform the way we go about > converting other parts of the codebase, especially some of the pack > algorithms. Thanks for doing this now. > Because we share some hash computation code between pack > checksums and object hashing, we need to decide whether to expose pack > checksums as struct object_id, even though they are technically not > object IDs. I think we used sha1 as checksums, because it was a hash function easily accessible, not because its other properties were the best to do its job. So I am undecided if we want to just "keep the same hash function for checksum as well as object hashing" or pickup another checksum, that actually *only* does checksumming (we don't need the cryptographic properties, such that an easy hash [such as crc, djb, fnv or murmur] would do; raw speed, barely protecting the pack file against bit flips). For the sake of symmetry I tend to go with the former, i.e use the object hash function for pack files, too. > Furthermore, if we end up needing to stuff an algorithm > value into struct object_id, we'll no longer be able to directly > reference object IDs in a pack without a copy. > > I've updated this series in some significant ways to reflect and better > implement the transition plan as it's developed. If there are ways > in which this series (or future series) can converge better on the > transition plan, that input would be valuable. > > This series is available from the usual places as branch hash-struct, > based against master as of 2.15-rc2. Thanks, Stefan