On Tue, Aug 28 2018, Edward Thomson wrote: > On Tue, Aug 28, 2018 at 2:50 PM, Ævar Arnfjörð Bjarmason > <avarab@xxxxxxxxx> wrote: >> If we instead had something like clean/smudge filters: >> >> [extensions] >> objectFilter = sha256-to-sha1 >> compatObjectFormat = sha1 >> [objectFilter "sha256-to-sha1"] >> clean = ... >> smudge = ... >> >> We could apply arbitrary transformations on objects through filters >> which would accept/return some simple format requesting them to >> translate such-and-such objects, and would either return object >> names/types under which to store them, or "nothing to do". > > If I'm understanding you correctly, then on the libgit2 side, I'm very much > opposed to this proposal. We never execute commands, nor do I want to start > thinking that we can do so arbitrarily. We run in environments where that's > a non-starter I'm being unclear. I'm suggesting that we slightly amend the syntax of what we're proposing to put in the .git/config to leave the door open for *optionally* doing arbitrary mappings. It would still work exactly the same internally for the common sha1<->sha256 case, i.e. neither git, libgit, jgit or anyone else would need to shell out to anything. They'd just pick up that common case and handle it internally, similar to how e.g. the crlf filter (v.s. full clean/smudge support) works in git & libgit2: https://github.com/libgit2/libgit2/blob/master/tests/filter/crlf.c So the sha256<->sha1 support would be an implicit built-in like crlf, it would just leave the door open to having something like git-lfs. Now what does that really mean? And I admit I may be missing something here. Unlike smudge/clean filters we're going to be constrained by having hashes of length 20 or 32, locally & remotely, since we wouldn't want to support arbitrary lengths, but with relatively small changes it'll allow for changing just: # local remote sha256<->sha1 To also support: # local remote fn(sha1)<->fn(sha1) fn(sha1)<->fn(sha256) fn(sha256)<->fn(sha1) fn(sha256)<->fn(sha256) Where fn() is some hook you'd provide to hook into the bits where we're e.g. unpacking SHA-1 objects from the remote, and writing them locally as SHA-256, except instead of (as we do by default) writing: SHA256_map(sha256(content)) = content You'd write: SHA256_map(sha256(fn(content))) = fn(content) Where fn() would need to be idempotent. Now, why is this useful or worth considering? As noted in the E-Mail I linked to it allows for some novel use cases for doing local to remote object translation. But really, I'm not suggesting that *that* is something we should consider. *All* I'm saying is that given the experience of how we started out with stuff like built-in "crlf", and then grew smudge/clean filters, that it's worth considering what sort of .git/config key-value pairs we'd pick that would yield themselves to such future extensions, should that be something we deem to be a good idea in the future. Because if we don't we've lost nothing, but if we do we'd need to support two sets of config syntaxes to do those two related things. > At present, in libgit2, users can provide their own mechanism for running > clean/smudge filters. But hash transformation / compatibility is going to > be a crucial compatibility component. So this is not something that we > could simply opt out of or require users to implement themselves. Indeed.