= Overview This is an RFC series proposing a basic abstraction for hash functions. As we get closer to converting the remainder of the codebase to use struct object_id, we should think about the design we want our hash function abstraction to take. This series is a proposal for one idea to start discussion. Input on any aspect of this proposal is welcome. This series exposes a struct git_hash_algo that contains basic information about a given hash algorithm that distinguishes it from other algorithms: name, lengths, implementing functions, and empty tree and blob constants. It also exposes an array of hash algorithms, and a constant for indexing them. The series also demonstrates a simple conversion using the abstraction over empty blob and tree values. In order to avoid conflicting with the struct repository work and with the goal of avoiding global variables as much as possible, I've pushed the hash algorithm into struct repository and exposed it via a #define. This necessitiates pulling repository.h into cache.h, which I don't think is fatal. Doing that, in turn, necessitated some work on the Subversion code to avoid conflicts. It should be fine for Junio to pick up the first two patches from this series, as they're relatively independent and valuable without the rest of the series. The rest should not be applied immediately, although they do pass the testsuite. I proposed this series now as it will inform the way we go about converting other parts of the codebase, especially some of the pack algorithms. Because we share some hash computation code between pack checksums and object hashing, we need to decide whether to expose pack checksums as struct object_id, even though they are technically not object IDs. Furthermore, if we end up needing to stuff an algorithm value into struct object_id, we'll no longer be able to directly reference object IDs in a pack without a copy. This series is available from the usual places as branch hash-struct, based against master. = Naming The length names are similar to the current constant names intentionally. I've used the "hash_algo" name for both the integer constant and the pointer to struct, although we could change the latter to "hash_impl" or such as people like. I chose to name the define "current_hash" and expose no other defines. The name is relatively short since we're going to be typing it a lot. However, if people like, we can capitalize it or expose other defines (say, a GIT_HASH_RAWSZ or GIT_HASH_HEXSZ) instead of or in addition to current_hash, which would make this name less interesting. Feel free to propose alternatives to the naming of anything in this series. = Open Issues I originally decided to convert hex.c as an example, but quickly found out that this caused segfaults. As part of setup, we call is_git_directory, which calls validate_headref, which ends up calling get_sha1_hex. Obviously, we don't have a repository, so the hash algorithm isn't set up yet. This is an area we'll need to consider making hash function agnostic, and we may also need to consider inserting a hash constant integer into struct object_id if we're going to do that. Alternatively, we could just paper over this issue as a special case. Clearly we're going to want to expose some sort of lookup functionality for hash algorithms. We'll need to expose lookup by name (for the .git/config file and any command-line options), but we may want other functions as well. What functions should those be? Should we expose the structure or the constant for those lookup functions? If the structure, we'll probably need to expose the constant in the structure as well for easy use. Should we avoid exposing the array of structure altogether and wrap this in a function? We could expose a union of hash context structures and take that as the pointer type for the API calls. That would probably obviate the need for ctxsz. We could expose hex versions of the blob constants if desired. This might make converting the remaining pieces of code that use them easier. There are probably dozens of other things I haven't thought of yet as well. brian m. carlson (6): vcs-svn: remove unused prototypes vcs-svn: rename repo functions to "svn_repo" setup: expose enumerated repo info Add structure representing hash algorithm Integrate hash algorithm support with repo setup Switch empty tree and blob lookups to use hash abstraction builtin/am.c | 2 +- builtin/checkout.c | 2 +- builtin/diff.c | 2 +- builtin/pull.c | 2 +- cache.h | 48 ++++++++++++++++++++++++++++++++++++++++++++---- diff-lib.c | 2 +- merge-recursive.c | 2 +- notes-merge.c | 2 +- repository.c | 7 +++++++ repository.h | 5 +++++ sequencer.c | 6 +++--- setup.c | 48 +++++++++++++++++++++++++++--------------------- sha1_file.c | 29 +++++++++++++++++++++++++++++ submodule.c | 2 +- vcs-svn/repo_tree.c | 6 +++--- vcs-svn/repo_tree.h | 13 +++---------- vcs-svn/svndump.c | 8 ++++---- 17 files changed, 133 insertions(+), 53 deletions(-)