On Wed, Jan 06 2021, brian m. carlson wrote: > On 2021-01-05 at 14:21:40, Ævar Arnfjörð Bjarmason wrote: >> >> On Sun, Jan 03 2021, brian m. carlson wrote: >> >> I think it makes sense to split up 1-4/5 here from 5/5 in this series >> since they're really unrelated changes, although due to the changes in >> 1-4 they'll conflict. > > Okay, I'll drop them. Not replying to most of this E-Mail because I think there's nothing left to add / you clarified things for me in those cases / we respectfully disagree / any outstanding points we can pick up in your re-roll / whatever :) >> So we're talking about hiding the old E-Mail, presumably because it was >> joe@ intsead of jane@, so in that case we could just support URI >> encoding: >> >> Jane Doe <jane@xxxxxxxxxxx> >> <jane@xxxxxxxxxxx> <%6A%6F%65@%64%65%76%65%6C%6F%70%65%72.%63%6F%6D> >> >> Made via: >> >> $ perl -MURI::Escape=uri_escape -wE 'say uri_escape q[joe@xxxxxxxxxxxxx], "^@."' >> %6A%6F%65@%64%65%76%65%6C%6F%70%65%72.%63%6F%6D >> >> Which also has the nice attribute that people can make it obvious what >> part they want to hide, since this is really a feature to enable social >> politeness & consideration: >> >> Jane Doe <jane@xxxxxxxxxxx> >> # I don't want to be known by my old name, thanks >> <jane@xxxxxxxxxxx> <%6A%6F%65@xxxxxxxxxxxxx> > > I don't think this feature is going to get used if we just encode names > or email addresses. In the United States, when someone transitions, > they get a court order to change their name. I don't think a lot of > corporate environments are going to want to just encode an old name or > email address in a trivially invertible way given that. This is > typically a topic handled with some sensitivity in most companies. > > I will tell you that I would not just use an encoded version if I were > changing my name for any of the reasons I've mentioned. That wouldn't > cut it for me, and I wouldn't use such a feature. The feature I'm > implementing is a feature I've talked with trans folks about, and that's > why I'm implementing this as it is. The response I got was essentially, > "It's not everything I want, but it's an improvement." > > If the decision is that we want to go with encoding instead of hashing, > then I'll drop this patch. I'm not going to put my name or sign-off on > that because I don't think it meets the need I'm addressing here. > > The entire problem, of course, is that we bake a human's personal name > and email address immutably into a Merkle tree. We know full well that > people do change their names and email addresses all the time (e.g., > marriage, job changes), and yet we have this design. In retrospect, we > should have done something different, but hindsight is 20/20 and I'm > just trying to do the best we can with what we've got. Doesn't the difference in some sense boil down to either an implicit promise or an implicit assumption that the hashed version is forever going to be protected by some security-through-obscurity/inconvenience when it comes to git.git & its default tooling? And would those users be as comfortable with the difference between encoded v.s. hashed if e.g. "git check-mailmap" learned to read the .mailmap and search-replace all the hashed versions with their materialized values, or if popular tools like Emacs learned to via a Git .mailmap in a "need translation" similar to *.gpg and *.gz. How about if popular web views of Git served up that materialized "check-mailmap" output by default? None of which I think is implausible that we'll get as follow-up patches, I might even submit some at some point, not out of some spite. Just because I don't want to maintain out-of-tree code for an out-of-tree program that understands a Git .mailmap today, but where I'd need to search-replace the hashed versions. Ditto it being very likely that popular editors or web viewers will gain support for this, just because it's tedious to manually hash & copy/paste & validate values. In looking at some of the fsck code recently & having some yet-unsubmitted patches I thought of trying to compine it with mailmap. I.e. it seems like a natural feature for fsck to gain to warn you about unused mailmap entries, just like it can warn about unreachable/dangling objects. After all these are really just sort-of pointers into our Merkle tree. Spewing out all the mappings seems like an obvious addition to that, e.g. in spewing out an "optimized/non-redundant" (plain or hashed) mailmap to re-commit. That's the main reason I'm uncomfortable with this approach, because it seems to me to implicitly rely on things that are tedious now, but which the march of history all but inevitably should make trivial if we were to integrate it. Unless we're *also* promising to forever intentionally (and artificially) keep it inconvenient. E.g. the example of how long it takes to clone & extract this info from chromium.git in the v1 thread. It seems like a fair assumption that we'll have some future version of git where you can ask a remote server about that sort of thing in milliseconds. Not because of this hashed .mailmap thing in particular, just as an emergent effect that it's happy to serve up things it knows about the DAG from having walked & cached it in general. E.g. info from the commit-graph, what hash is contained in what ref, or how one value (such as a .mailmap entry) maps to another etc.