[PATCH v2] import-tars: ignore the global PAX header

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Johannes Schindelin <johannes.schindelin@xxxxxx>

The tar importer in `contrib/fast-import/import-tars.perl` has a very
convenient feature: if _all_ paths stored in the imported `.tar` start
with a common prefix, e.g. `git-2.26.0/` in the tar at
https://github.com/git/git/archive/v2.26.0.tar.gz, then this prefix is
stripped.

This feature makes a ton of sense because it is relatively common to
import two or more revisions of the same project into Git, and obviously
we don't want all files to live in a tree whose name changes from
revision to revision.

Now, the problem with that feature is that it breaks down if there is a
`pax_global_header` "file" located outside of said prefix, at the top of
the tree. This is the case for `.tar` files generated by Git's very own
`git archive` command: it inserts that header, and `git archive` allows
specifying a common prefix (that the header does _not_ share with the
other files contained in the archive) via `--prefix=my-project-1.0.0/`.

Let's just skip any global header when importing `.tar` files into Git.

Note: this global header might contain useful information. For example,
in the output of `git archive`, it lists the original commit, which _is_
useful information. A future improvement to the `import-tars.perl`
script might be to include that information in the commit message, or do
other things with the information (e.g. use `mtime` information
contained in the global header as date of the commit). This patch does
not prevent any future patch from making that happen, it only prevents
the header from being treated as if it was a regular file.

Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx>
---
    Ignore the global PAX header in import-tars.perl
    
    This problem came up in Pacman-related work, where PKGBUILD definitions
    would reference the tarballs downloaded from GitHub, and patches would
    be applied on top. To work on those patches efficiently (e.g. when an
    upgrade to a new version of the project no longer lets those patches
    apply), I need to be able to import those tarballs into playground
    worktrees and work on them. I like to use 
    contrib/fast-import/import-tars.perl for that purpose, but it really
    needs to strip the prefix, otherwise it is too tedious to work with it.
    
    Changes since v1:
    
     * Mentioned the implicit prefix-stripping feature of import-tars.perl 
       in the commit message; Without that context, it is really hard to
       understand the motivation for this patch.
     * Clarified in the commit message that this patch does not prevent any
       future patches that would use the information contained in the global
       header.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-577%2Fdscho%2Fimport-tars-skip-pax-header-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-577/dscho/import-tars-skip-pax-header-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/577

Range-diff vs v1:

 1:  718bde8f4a7 ! 1:  842dabe6128 import-tars: ignore the global PAX header
     @@ -2,12 +2,34 @@
      
          import-tars: ignore the global PAX header
      
     -    Git's own `git archive` inserts that header, but it often gets into the
     -    way of `import-tars.perl` e.g. when a prefix was specified (for example
     -    via `--prefix=my-project-1.0.0/`, or when downloading a `.tar.gz` from
     -    GitHub releases): this prefix _should_ be stripped.
     +    The tar importer in `contrib/fast-import/import-tars.perl` has a very
     +    convenient feature: if _all_ paths stored in the imported `.tar` start
     +    with a common prefix, e.g. `git-2.26.0/` in the tar at
     +    https://github.com/git/git/archive/v2.26.0.tar.gz, then this prefix is
     +    stripped.
      
     -    Let's just skip it.
     +    This feature makes a ton of sense because it is relatively common to
     +    import two or more revisions of the same project into Git, and obviously
     +    we don't want all files to live in a tree whose name changes from
     +    revision to revision.
     +
     +    Now, the problem with that feature is that it breaks down if there is a
     +    `pax_global_header` "file" located outside of said prefix, at the top of
     +    the tree. This is the case for `.tar` files generated by Git's very own
     +    `git archive` command: it inserts that header, and `git archive` allows
     +    specifying a common prefix (that the header does _not_ share with the
     +    other files contained in the archive) via `--prefix=my-project-1.0.0/`.
     +
     +    Let's just skip any global header when importing `.tar` files into Git.
     +
     +    Note: this global header might contain useful information. For example,
     +    in the output of `git archive`, it lists the original commit, which _is_
     +    useful information. A future improvement to the `import-tars.perl`
     +    script might be to include that information in the commit message, or do
     +    other things with the information (e.g. use `mtime` information
     +    contained in the global header as date of the commit). This patch does
     +    not prevent any future patch from making that happen, it only prevents
     +    the header from being treated as if it was a regular file.
      
          Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx>
      


 contrib/fast-import/import-tars.perl | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/contrib/fast-import/import-tars.perl b/contrib/fast-import/import-tars.perl
index e800d9f5c9c..d50ce26d5d9 100755
--- a/contrib/fast-import/import-tars.perl
+++ b/contrib/fast-import/import-tars.perl
@@ -139,6 +139,8 @@
 			print FI "\n";
 		}
 
+		next if ($typeflag eq 'g'); # ignore global header
+
 		my $path;
 		if ($prefix) {
 			$path = "$prefix/$name";

base-commit: b4374e96c84ed9394fed363973eb540da308ed4f
-- 
gitgitgadget



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux