On Sun, Nov 11, 2012 at 06:45:32PM +0100, Felipe Contreras wrote: > > If there is a standard filter, then what is the advantage in doing it as > > a pipe? Why not just teach fast-import the same trick (and possibly make > > it optional)? That would be simpler, more efficient, and it would make > > it easier for remote helpers to turn it on (they use a command-line > > switch rather than setting up an extra process). > > Right, but instead of a command-line switch it probably should be > enabled on the stream: > > feature clean-authors > > Or something. Yeah, I was thinking it would need a feature switch to the remote helper to turn on the command-line, but I forgot that fast-import can take feature lines directly. > > We can clean up and normalize > > things like whitespace (and we probably should if we do not do so > > already). But beyond that, we have no context about the name; only the > > exporter has that. > > There is no context. There may not be a lot, but there is some: > These are exactly the same questions every exporter must answer. And > there's no answer, because the field is not a git author, it's a > mercurial user, or a bazaar committer, or who knows what. The exporter knows that the field is a mercurial user (or whatever). Fast-import does not even know that, and cannot apply any rules or heuristics about the format of a mercurial user string, what is common in the mercurial world, etc. It may not be a lot of context in some cases (I do not know anything about mercurial's formats, so I can't say what knowledge is available). But at least the exporter has a chance at domain-specific interpretation of the string. Fast-import has no chance, because it does not know the domain. I've snipped the rest of your argument, which is basically that mercurial does not have any context at all, and knowing that it is a mercurial author is useless. I am not sure that is true; even knowing that it is a free-form field versus something structured (e.g., we know CVS authors are usernames on the server server) is useful. But I would agree there are probably multiple systems that are like mercurial in that the author field is usually something like "name <email>", but may be arbitrary text (I assume bzr is the same way, but you would know better than me). So it may make sense to have some stock algorithm to try to convert arbitrary almost-name-and-email text into name and email to reduce duplication between exporters, but: 1. It must be turned on explicitly by the exporter, since we do not want to munge more structured input from clueful exporters. 2. The exporter should only turn it on after replacing its own munging (e.g., it shouldn't be adding junk like <none@none>; fast-import would need to receive as pristine an input as possible). 3. Exporters should not use it if they have any broken-down representation at all. Even knowing that the first half is a human name and the second half is something else would give it a better shot at cleaning than fast-import would get. Alternatively, the feature could enable the exporter to pass a more structured ident to git. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html