Am 03.03.23 um 16:38 schrieb Cristian Le: >> In your issue #444 you write that "git archive HEAD" works, but >> "git archive HEAD:./" doesn't. Why do you need to use the latter? > > Specifically we want to allow for `HEAD:./sub_dir` where `./sub_dir` > contains `.gitattributes` and `.git_archive.txt`. > > Alternatively, it would be helpful if we can pass `--transform` > commands of `tar` directly so that we can change the paths. > > Overall what we are doing in tito is that the source would be in > `./src` and outside is metadata like `./my_package.spec`. We are > using `git archive HEAD:./src --prefix=my_package-1.0.0` to pass the > appropriate form that the rpm spec file can locate. In a tar command > we can use `--transform=s|^src/|my_package-1.0.0/|` to achieve the > equivalent. What is Tito? https://github.com/rpm-software-management/tito says: "Tito is a tool for managing RPM based projects using git for their source code repository." It supports Git repositories containing multiple projects. I suppose that means e.g. for Git's own repo that Tito would allow creating a separate RPM file for e.g. git-gui. Side note: Tito features include: "Create reliable tar.gz files with consistent checksums from any tag." That's achieved by compressing using "gzip -n -c". Avoiding the native tgz support of git archive -- probably only because the code predates it -- shields Tito from the change to use our internal gzip implementation discussed recently in https://lore.kernel.org/git/a812a664-67ea-c0ba-599f-cb79e2d96694@xxxxxxxxx/ Note, however, that the tar output of git archive is not guaranteed to be stable between Git versions, either. Recently adding such a stable format was proposed in https://lore.kernel.org/git/20230205221728.4179674-1-sandals@xxxxxxxxxxxxxxxxxxxx/ The code for calling git archive with a tree was present in Tito's initial commit, which says that it was taken from Spacewalk: https://github.com/rpm-software-management/tito/commit/e87345d7b7. There it was introduced along with a script that changes the mtime of archive entries from the current time to the commit timestamp by https://github.com/spacewalkproject/spacewalk/commit/34267e39d472. I don't fully understand the explanation in its commit message ("make it possible to call make srpm even if the directory of the package has changed"); perhaps it requires more domain knowledge. But I can understand the need for archiving sub-directories in the context of supporting multi-project repositories. > However we cannot use the `tar` directly because that would affect > the timestamps and permissions of the file that are set by `git > archive`. GNU tar has the options --mode and --mtime to chose permissions and modifications of files added to an archive. git archive is going to get an --mtime option as well in the next release, by the way. > So allowing for something like `git archive HEAD > --transform=s|^src/|my_package-1.0.0/|`, where the transform is done > after `.gitattributes` is performed would solve this issue. GNU tar has this --transform option, bsdtar similarly has -s. Both also have --strip-components (GNU tar only for extraction, though), which is a bit simpler and should suffice for your use case. --- >8 --- Subject: [PATCH] archive: add --strip-components Allow removing leading elements from paths of archive entries. That's useful when archiving sub-directories and not wanting to keep the common path prefix, e.g.: $ git archive --strip-components=1 HEAD sha1dc | tar tf - .gitattributes LICENSE.txt sha1.c sha1.h ubc_check.c ubc_check.h The same can be achieved by specifying a tree instead of a commit and a pathspec: $ git archive HEAD:sha1dc | tar tf - .gitattributes LICENSE.txt sha1.c sha1.h ubc_check.c ubc_check.h However, this doesn't support the export-subst attribute, doesn't include the commit hash as an archive comment and uses the current time instead of the commit date as mtime for archive entries. The new option is adapted from bsdtar. GNU tar provides it as well, but only for extraction. The new option does not affect the paths of entries added by --add-file and --add-virtual-file because they are handcrafted to their desired values already. Similarly, the value of --prefix is not subject to component stripping. Signed-off-by: René Scharfe <l.s.r@xxxxxx> --- Documentation/git-archive.txt | 6 ++++++ archive.c | 16 ++++++++++++++++ archive.h | 1 + t/t5000-tar-tree.sh | 13 +++++++++++++ 4 files changed, 36 insertions(+) diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt index 6bab201d37..5dad917e7b 100644 --- a/Documentation/git-archive.txt +++ b/Documentation/git-archive.txt @@ -55,6 +55,12 @@ OPTIONS rightmost value is used for all tracked files. See below which value gets used by `--add-file` and `--add-virtual-file`. +--strip-components=<n>:: + Remove the specified number of leading path elements. Pathnames + with fewer elements will be silently skipped. Does not affect + the prefix added by `--prefix`, nor entries added with + `--add-file` or `--add-virtual-file`. + -o <file>:: --output=<file>:: Write the archive to <file> instead of stdout. diff --git a/archive.c b/archive.c index 9aeaf2bd87..8308d4d9c4 100644 --- a/archive.c +++ b/archive.c @@ -166,6 +166,18 @@ static int write_archive_entry(const struct object_id *oid, const char *base, args->convert = check_attr_export_subst(check); } + if (args->strip_components > 0) { + size_t orig_baselen = baselen; + for (int i = 0; i < args->strip_components; i++) { + const char *slash = memchr(base, '/', baselen); + if (!slash) + return S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0; + baselen -= slash - base + 1; + base = slash + 1; + } + strbuf_remove(&path, args->baselen, orig_baselen - baselen); + } + if (args->verbose) fprintf(stderr, "%.*s\n", (int)path.len, path.buf); @@ -593,12 +605,15 @@ static int parse_archive_args(int argc, const char **argv, int verbose = 0; int i; int list = 0; + int strip_components = 0; int worktree_attributes = 0; struct option opts[] = { OPT_GROUP(""), OPT_STRING(0, "format", &format, N_("fmt"), N_("archive format")), OPT_STRING(0, "prefix", &base, N_("prefix"), N_("prepend prefix to each pathname in the archive")), + OPT_INTEGER(0, "strip-components", &strip_components, + N_("remove leading path elements")), { OPTION_CALLBACK, 0, "add-file", args, N_("file"), N_("add untracked file to archive"), 0, add_file_cb, (intptr_t)&base }, @@ -675,6 +690,7 @@ static int parse_archive_args(int argc, const char **argv, args->baselen = strlen(base); args->worktree_attributes = worktree_attributes; args->mtime_option = mtime_option; + args->strip_components = strip_components; return argc; } diff --git a/archive.h b/archive.h index 7178e2a9a2..e9becbd57d 100644 --- a/archive.h +++ b/archive.h @@ -23,6 +23,7 @@ struct archiver_args { unsigned int worktree_attributes : 1; unsigned int convert : 1; int compression_level; + int strip_components; struct string_list extra_files; struct pretty_print_context *pretty_ctx; }; diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index 918a2fc7c6..629d2e78d7 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -271,6 +271,19 @@ test_expect_success 'git get-tar-commit-id' ' test_cmp expect actual ' +test_expect_success 'git archive --strip-components' ' + git archive --strip-components=3 HEAD >strip3.tar && + ( + mkdir strip3 && + cd strip3 && + "$TAR" xf ../strip3.tar && + find . | grep -v "^\.\$" | sort >../strip3.lst + ) && + sed -ne "s-\([^/]*/\)\{3\}-./-p" a.lst >expect && + test_cmp expect strip3.lst && + diff -r a/long_path_to_a_file/long_path_to_a_file strip3 +' + test_expect_success 'git archive with --output, override inferred format' ' git archive --format=tar --output=d4.zip HEAD && test_cmp_bin b.tar d4.zip -- 2.39.2