On Sun, Nov 11, 2018 at 12:32:22AM -0800, Elijah Newren wrote: > > > Documentation/git-fast-export.txt | 7 +++++++ > > > builtin/fast-export.c | 20 +++++++++++++++----- > > > fast-import.c | 17 +++++++++++++++++ > > > t/t9350-fast-export.sh | 17 +++++++++++++++++ > > > 4 files changed, 56 insertions(+), 5 deletions(-) > > > > The fast-import format is documented in Documentation/git-fast-import.txt. > > It might need an update to cover the new format. > > We document the format in both fast-import.c and > Documentation/git-fast-import.txt? Maybe we should delete the long > comments in fast-import.c so this isn't duplicated? Yes, that is probably worth doing (see the comment at the top of fast-import.c). Some information might need to be migrated. If we're going to have just one spot, I think it needs to be the user-facing documentation. This is a public interface that other people are building compatible implementations for (including your new tool). > > > +--show-original-ids:: > > > + Add an extra directive to the output for commits and blobs, > > > + `originally <SHA1SUM>`. While such directives will likely be > > > + ignored by importers such as git-fast-import, it may be useful > > > + for intermediary filters (e.g. for rewriting commit messages > > > + which refer to older commits, or for stripping blobs by id). > > > > I'm not quite sure how a blob ends up being rewritten by fast-export (I > > get that commits may change due to dropping parents). > > It doesn't get rewritten by fast-export; it gets rewritten by other > intermediary filters, e.g. in something like this: > > git fast-export --show-original-ids --all | intermediary_filter | > git fast-import > > The intermediary_filter program may want to strip out blobs by id, or > remove filemodify and filedelete directives unless they touch certain > paths, etc. OK, that matches my understanding. So why does fast-export need to print the blob ids? If the intermediary is rewriting blobs, it can then produce the "originally" line itself, can't it? The more interesting case I guess is your "strip out blobs by id" example. There the intermediary _could_ do so itself, but it would require recomputing the object id of each blob. If you use "--no-data", then this just works (we specify tree entries by object id, rather than by mark). But I can see how it would be useful to have the information even without "--no-data" (i.e., if you are doing multiple kinds of rewrites on a single stream). I think the thing that confused me is that this "originally" is doing two things: - mentioning blob ids as an optimization / convenience for the reader - mentioning rewritten commit (and presumably tag?) ids that were rewritten as part of a partial history export. I suppose even trees could be rewritten that way, too, but fast-import doesn't generally consider trees to be a first-class item. So I'm OK with it, but I wonder if there is an easier way to explain it. -Peff