On Sun, Nov 11, 2012 at 6:15 PM, Jeff King <peff@xxxxxxxx> wrote: > On Sun, Nov 11, 2012 at 12:00:44PM -0500, A Large Angry SCM wrote: > If there is a standard filter, then what is the advantage in doing it as > a pipe? Why not just teach fast-import the same trick (and possibly make > it optional)? That would be simpler, more efficient, and it would make > it easier for remote helpers to turn it on (they use a command-line > switch rather than setting up an extra process). Right, but instead of a command-line switch it probably should be enabled on the stream: feature clean-authors Or something. > But what I don't understand is: what would such a standard filter look > like? Fast-import (or a filter) would already receive the exporter's > best attempt at a git-like ident string. Currently, yeah, because there's no other option. It's either try to clean it up, or fail. But if 'git fast-import' as a superior alternative, I certainly would remove my custom code and enable that feature. > We can clean up and normalize > things like whitespace (and we probably should if we do not do so > already). But beyond that, we have no context about the name; only the > exporter has that. There is no context. > So if we receive: > > Foo Bar<foo.bar@xxxxxxxxxxx> <none@none> > > or: > > Foo Bar<foo.bar@xxxxxxxxxxx <none@none> > > or: > > Foo Bar<foo.bar@xxxxxxxxxxx > > what do we do with it? Is the first part a malformed name/email pair, > and the second part is crap added by a lazy exporter? Or does the > exporter want to keep the angle brackets as part of the name field? Is > there a malformed email in the last one, or no email at all? These are exactly the same questions every exporter must answer. And there's no answer, because the field is not a git author, it's a mercurial user, or a bazaar committer, or who knows what. >From whatever source, these all might be valid authors: john john <john@xxxxxxxxxxxx> (grease) <test@xxxxxxxx> test@xxxxxxxx test<test@xxxxxxxx> test <test@xxxxxxxx test # a space test < test@xxxxxxxx > test >test@xxxxxxx> test <test <at> test <dot> com> <> > < The first chapter of the LOTR There is no context. > The exporter is the only program that actually knows where the data came > from, It doesn't matter where it came from, it's not a name/email pair. > how it should be broken down, It cannot be broken down, it's free-form text. Any text. > and what is appropriate for pulling > data out of its particular source system. This free-form text is the lowest granularity. There is nothing else. > For that reason, the exporter > has to be the place where we come up with a syntactically correct and > unambiguous ident. *If* the exporter is able to do this, sure, but many don't have any more information. See: % hg commit -u 'Foo Bar<foo.bar@xxxxxxxxxxx> <none@none>' -m one % hg --debug log changeset: 0:5ef37a2c773f02d0e01f1ecdcc59149832d294e8 tag: tip phase: draft parent: -1:0000000000000000000000000000000000000000 parent: -1:0000000000000000000000000000000000000000 manifest: 0:c6d4cd25b9fc2f83b0dd51f4acbea9486fce54d7 user: Foo Bar<foo.bar@xxxxxxxxxxx> <none@none> date: Sun Nov 11 18:33:00 2012 +0100 files+: file extra: branch=default description: one What is a hg exporter tool supposed to do with that? What such a tool can do, 'git fast-import' can do. > I am not opposed to adding a mailmap-like feature to fast-import to map > identities, but it has to start with sane, unambiguous output from the > exporter. And if that's not possible? -- Felipe Contreras -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html