From: Johannes Schindelin <johannes.schindelin@xxxxxx> The tar importer in `contrib/fast-import/import-tars.perl` has a very convenient feature: if _all_ paths stored in the imported `.tar` start with a common prefix, e.g. `git-2.26.0/` in the tar at https://github.com/git/git/archive/v2.26.0.tar.gz, then this prefix is stripped. This feature makes a ton of sense because it is relatively common to import two or more revisions of the same project into Git, and obviously we don't want all files to live in a tree whose name changes from revision to revision. Now, the problem with that feature is that it breaks down if there is a `pax_global_header` "file" located outside of said prefix, at the top of the tree. This is the case for `.tar` files generated by Git's very own `git archive` command: it inserts that header, and `git archive` allows specifying a common prefix (that the header does _not_ share with the other files contained in the archive) via `--prefix=my-project-1.0.0/`. Let's just skip any global header when importing `.tar` files into Git. Note: this global header might contain useful information. For example, in the output of `git archive`, it lists the original commit, which _is_ useful information. A future improvement to the `import-tars.perl` script might be to include that information in the commit message, or do other things with the information (e.g. use `mtime` information contained in the global header as date of the commit). This patch does not prevent any future patch from making that happen, it only prevents the header from being treated as if it was a regular file. Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx> --- Ignore the global PAX header in import-tars.perl This problem came up in Pacman-related work, where PKGBUILD definitions would reference the tarballs downloaded from GitHub, and patches would be applied on top. To work on those patches efficiently (e.g. when an upgrade to a new version of the project no longer lets those patches apply), I need to be able to import those tarballs into playground worktrees and work on them. I like to use contrib/fast-import/import-tars.perl for that purpose, but it really needs to strip the prefix, otherwise it is too tedious to work with it. Changes since v1: * Mentioned the implicit prefix-stripping feature of import-tars.perl in the commit message; Without that context, it is really hard to understand the motivation for this patch. * Clarified in the commit message that this patch does not prevent any future patches that would use the information contained in the global header. Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-577%2Fdscho%2Fimport-tars-skip-pax-header-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-577/dscho/import-tars-skip-pax-header-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/577 Range-diff vs v1: 1: 718bde8f4a7 ! 1: 842dabe6128 import-tars: ignore the global PAX header @@ -2,12 +2,34 @@ import-tars: ignore the global PAX header - Git's own `git archive` inserts that header, but it often gets into the - way of `import-tars.perl` e.g. when a prefix was specified (for example - via `--prefix=my-project-1.0.0/`, or when downloading a `.tar.gz` from - GitHub releases): this prefix _should_ be stripped. + The tar importer in `contrib/fast-import/import-tars.perl` has a very + convenient feature: if _all_ paths stored in the imported `.tar` start + with a common prefix, e.g. `git-2.26.0/` in the tar at + https://github.com/git/git/archive/v2.26.0.tar.gz, then this prefix is + stripped. - Let's just skip it. + This feature makes a ton of sense because it is relatively common to + import two or more revisions of the same project into Git, and obviously + we don't want all files to live in a tree whose name changes from + revision to revision. + + Now, the problem with that feature is that it breaks down if there is a + `pax_global_header` "file" located outside of said prefix, at the top of + the tree. This is the case for `.tar` files generated by Git's very own + `git archive` command: it inserts that header, and `git archive` allows + specifying a common prefix (that the header does _not_ share with the + other files contained in the archive) via `--prefix=my-project-1.0.0/`. + + Let's just skip any global header when importing `.tar` files into Git. + + Note: this global header might contain useful information. For example, + in the output of `git archive`, it lists the original commit, which _is_ + useful information. A future improvement to the `import-tars.perl` + script might be to include that information in the commit message, or do + other things with the information (e.g. use `mtime` information + contained in the global header as date of the commit). This patch does + not prevent any future patch from making that happen, it only prevents + the header from being treated as if it was a regular file. Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx> contrib/fast-import/import-tars.perl | 2 ++ 1 file changed, 2 insertions(+) diff --git a/contrib/fast-import/import-tars.perl b/contrib/fast-import/import-tars.perl index e800d9f5c9c..d50ce26d5d9 100755 --- a/contrib/fast-import/import-tars.perl +++ b/contrib/fast-import/import-tars.perl @@ -139,6 +139,8 @@ print FI "\n"; } + next if ($typeflag eq 'g'); # ignore global header + my $path; if ($prefix) { $path = "$prefix/$name"; base-commit: b4374e96c84ed9394fed363973eb540da308ed4f -- gitgitgadget