On Sat, Nov 10, 2018 at 11:20 PM Jeff King <peff@xxxxxxxx> wrote: > > On Sat, Nov 10, 2018 at 10:23:11PM -0800, Elijah Newren wrote: > > > Knowing the original names (hashes) of commits, blobs, and tags can > > sometimes enable post-filtering that would otherwise be difficult or > > impossible. In particular, the desire to rewrite commit messages which > > refer to other prior commits (on top of whatever other filtering is > > being done) is very difficult without knowing the original names of each > > commit. > > > > This commit teaches a new --show-original-ids option to fast-export > > which will make it add a 'originally <hash>' line to blob, commits, and > > tags. It also teaches fast-import to parse (and ignore) such lines. > > Makes sense as a feature; I think filter-branch can make its mappings > available, too. > > Do we need to worry about compatibility with other fast-import programs? > I think no, because this is not enabled by default (so if sending the > extra lines to another importer hurts, the answer is "don't do that"). > > I have a vague feeling that there might be some way to combine this with > --export-marks or --no-data, but I can't really think of a way. They > seem related, but not quite. > > > --- > > Documentation/git-fast-export.txt | 7 +++++++ > > builtin/fast-export.c | 20 +++++++++++++++----- > > fast-import.c | 17 +++++++++++++++++ > > t/t9350-fast-export.sh | 17 +++++++++++++++++ > > 4 files changed, 56 insertions(+), 5 deletions(-) > > The fast-import format is documented in Documentation/git-fast-import.txt. > It might need an update to cover the new format. We document the format in both fast-import.c and Documentation/git-fast-import.txt? Maybe we should delete the long comments in fast-import.c so this isn't duplicated? > > --- a/Documentation/git-fast-export.txt > > +++ b/Documentation/git-fast-export.txt > > @@ -121,6 +121,13 @@ marks the same across runs. > > used by a repository which already contains the necessary > > parent commits. > > > > +--show-original-ids:: > > + Add an extra directive to the output for commits and blobs, > > + `originally <SHA1SUM>`. While such directives will likely be > > + ignored by importers such as git-fast-import, it may be useful > > + for intermediary filters (e.g. for rewriting commit messages > > + which refer to older commits, or for stripping blobs by id). > > I'm not quite sure how a blob ends up being rewritten by fast-export (I > get that commits may change due to dropping parents). It doesn't get rewritten by fast-export; it gets rewritten by other intermediary filters, e.g. in something like this: git fast-export --show-original-ids --all | intermediary_filter | git fast-import The intermediary_filter program may want to strip out blobs by id, or remove filemodify and filedelete directives unless they touch certain paths, etc. > The name "originally" doesn't seem great to me. Probably because I would > continually wonder if it has one "l" or two. ;) Perhaps something like > "original-oid" might be better. That's well into bikeshed territory, > though. I wasn't a huge fan of "originally" either, but I just couldn't come up with anything else that wasn't really long. I'd be happy to switch to original-oid.