Am 02.01.2018 um 07:58 schrieb suzuki toshiya: > Dear René , > > René Scharfe wrote: >> Am 29.12.2017 um 15:05 schrieb suzuki toshiya: >>> The ownership of files created by git-archive is always >>> root:root. Add --owner and --group options which work >>> like the GNU tar equivalent to allow overriding these >>> defaults. >> >> In which situations do you use the new options? >> >> (The sender would need to know the names and/or IDs on the receiving >> end. And the receiver would need to be root to set both IDs, or be a >> group member to set the group ID; I guess the latter is more common.) > > Thank you for asking the background. > > In the case that additional contents are appended to the tar file > generated by git-archive, the part by git-archive and the part > appended by common tar would have different UID/GID, because common > tar preserves the UID/GID of the original files. > > Of cource, both of GNU tar and bsdtar have the options to set > UID/GID manually, but their syntax are different. > > In the recent source package of poppler (poppler.freedesktop.org), > there are 2 sets of UID/GIDs are found: > https://poppler.freedesktop.org/poppler-0.62.0.tar.xz > > I've discussed with the maintainers of poppler, and there was a > suggestion to propose a feature to git. > https://lists.freedesktop.org/archives/poppler/2017-December/012739.html So this is in the context of generating release tarballs that contain untracked files as well. That's done in Git's own Makefile, too: dist: git-archive$(X) configure ./git-archive --format=tar \ --prefix=$(GIT_TARNAME)/ HEAD^{tree} > $(GIT_TARNAME).tar @mkdir -p $(GIT_TARNAME) @cp configure $(GIT_TARNAME) @echo $(GIT_VERSION) > $(GIT_TARNAME)/version @$(MAKE) -C git-gui TARDIR=../$(GIT_TARNAME)/git-gui dist-version $(TAR) rf $(GIT_TARNAME).tar \ $(GIT_TARNAME)/configure \ $(GIT_TARNAME)/version \ $(GIT_TARNAME)/git-gui/version @$(RM) -r $(GIT_TARNAME) gzip -f -9 $(GIT_TARNAME).tar Having files with different owners and groups is a non-issue when extracting with --no-same-owner, which is the default for regular users. I assume this covers most use cases in the wild. The generated archive leaks the IDs of the user preparing the archive in the appended entries for untracked files. I think that's more of a concern. Publishing a valid non-root username on your build system may invite attackers. Changing the build procedure to set owner and group to root as well as UID and GID to zero seems like a better idea. This is complicated by the inconsistent command line options for GNU tar and bsdtar, as you mentioned. So how about making it possible to append untracked files using git archive? This could simplify the dist target for Git as well. It's orthogonal to adding the ability to explicitly specify owner and group, but might suffice in most (all?) cases. Not sure what kind of file name transformation abilities would be needed and how to package them nicely. The --transform option of GNU tar with its sed replace expressions seems quite heavy for me. With poppler it's only used to add the --prefix string; I'd expect that to be done for all appended files anyway. Perhaps something like --append-file=<FILE> with no transformation feature is already enough for most cases? >> Would it make sense to support the new options for ZIP files as well? > > I was not aware of the availability of UID/GID in pkzip file format... > Oh, checking APPNOTE.TXT ( https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT ), > there is a storage! (see 4.5.7-Unix Extra Field). But it seems > that current git-archive emits pkzip without the field. Indeed. Git doesn't track owners and groups, so it doesn't make sense to emit that kind of information so far. If git archive grows options to specify such meta-information then it should be supported by all archive formats (or documented to be tar-specific). However, the UNIX Extra Field in ZIP archives only allows to store UID and GID (no names), which is useless unless the sender knows the ID range of the receiver -- which is unlikely when distributing software on the Internet. And even then it won't work with Windows, which has long Security Identifiers (SIDs) instead. So these are more advantages for letting git archive append untracked files: It's format-agnostic and more portable. [snipped interesting history of security-related tar options] Btw. I like how bsdtar has --insecure as a synonym for -p (preserve file permissions when extracting). It's a bit sad that this is still the default for root, though. OpenBSD cut that behavior out of their tar almost 20 years ago. (An evil tar archive could be used to fill the quota of unsuspecting users, or add setuid executables.) >>> +#if ULONG_MAX > 0xFFFFFFFFUL >>> + /* >>> + * --owner, --group rejects uid/gid greater than 32-bit >>> + * limits, even on 64-bit platforms. >>> + */ >>> + if (ul > 0xFFFFFFFFUL) >>> + return STR_IS_DIGIT_TOO_LARGE; >>> +#endif >> >> The #if is not really necessary, is it? Compilers should be able to >> optimize the conditional out on 32-bit platforms. > > Thanks for finding this, I'm glad to have a chance to ask a > question; git is not needed to care for 16-bit platforms? I'm not aware of attempts to compile Git on 16-bit systems, but I doubt it would be much fun to use the result anyway. It couldn't handle files bigger than 64KB for a start. >> GNU tar and bsdtar show the names of owner and group with -t -v at >> least, albeit in slightly different formats. Can this help avoid >> parsing the archive on our own? > > Yeah, writing yet another tar archive parser in C, to avoid the additional > dependency to Python or newer Perl (Archive::Tar since perl-5.10), is > painful, I feel (not only for me but also for the maintainers). > If tar command itself works well, it would be the best. > > But, I'm not sure whether the format of "tar tv" output is stably > standardized. It's the reason why I wrote Python tool. If I execute > git-archive with sufficently long randomized username & uid in > several times, it would be good test? The append-untracked feature would avoid that issue as well. A unique username doesn't have to be long or random if you control all other strings that "tar tv" would show (in particular file names). You could e.g. use "UserName" and then simply grep for it. Checking that git archive doesn't add unwanted characters is harder if the format isn't strictly known -- how to reliably recognize it sneaking in an extra space or slash? Not sure if such a test is really needed, though. >> But getting a short program like zipdetails for tar would be nice as >> well of course. :) > > I wrote something in C: > https://github.com/mpsuzuki/git/blob/pullreq-20171227-c/t/helper/test-parse-tar-file.c OK. (But the append-untracked feature could be tested without such a helper. :) Just some loose comments instead of a real review: You could get the line count down by including Git's tar.h and parse-options.h (Documentation/technical/api-parse-options.txt). > but if somebody wants the support of other tar variants, > he/she would have some headache :-) Yeah, the history of tar and its various platform-specific formats fills volumes, no doubt. René