Dear René ,
René Scharfe wrote:
Am 29.12.2017 um 15:05 schrieb suzuki toshiya:
The ownership of files created by git-archive is always
root:root. Add --owner and --group options which work
like the GNU tar equivalent to allow overriding these
defaults.
In which situations do you use the new options?
(The sender would need to know the names and/or IDs on the receiving
end. And the receiver would need to be root to set both IDs, or be a
group member to set the group ID; I guess the latter is more common.)
Thank you for asking the background.
In the case that additional contents are appended to the tar file
generated by git-archive, the part by git-archive and the part
appended by common tar would have different UID/GID, because common
tar preserves the UID/GID of the original files.
Of cource, both of GNU tar and bsdtar have the options to set
UID/GID manually, but their syntax are different.
In the recent source package of poppler (poppler.freedesktop.org),
there are 2 sets of UID/GIDs are found:
https://poppler.freedesktop.org/poppler-0.62.0.tar.xz
I've discussed with the maintainers of poppler, and there was a
suggestion to propose a feature to git.
https://lists.freedesktop.org/archives/poppler/2017-December/012739.html
So now I'm trying.
Would it make sense to support the new options for ZIP files as well?
I was not aware of the availability of UID/GID in pkzip file format...
Oh, checking APPNOTE.TXT (
https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT ),
there is a storage! (see 4.5.7-Unix Extra Field). But it seems
that current git-archive emits pkzip without the field.
The background why I propose the options for tar format was described
in above. Similar things are hoped by pkzip users? If it's required,
I will try.
+--owner=<name>[:<uid>]::
+ Force <name> as owner and <uid> as uid for the files in the tar
+ archive. If <uid> is not supplied, <name> can be either a user
+ name or numeric UID. In this case the missing part (UID or
+ name) will be inferred from the current host's user database.
+
+--group=<name>[:<gid>]::
+ Force <name> as group and <gid> as gid for the files in the tar
+ archive. If <gid> is not supplied, <name> can be either a group
+ name or numeric GID. In this case the missing part (GID or
+ name) will be inferred from the current host's group database.
+
IIUC the default behavior is kept, i.e. without these options the
archive entries appear to be owned by root:root. I think it's a good
idea to mention this here.
Indeed. The default behaviour of git-archive without these options
(root:root) would be different from that of (common) tar (preserving
uid/gid of the files to be archived), it should be clarified.
bsdtar has --uname, --uid, --gname, and -gid, which seem simpler. At
least you could use OPT_STRING and OPT_INTEGER with them (plus a range
check). And they should be easier to explain.
Thank you very much for proposing good alternative. Indeed, such well-
separated options make the code simple & stable. However, according
to the manual search systems of FreeBSD ( https://www.freebsd.org/cgi/man.cgi ),
the options for such functionalities are not always same.
FreeBSD 8.2 and earlier: --uname, --gname, --uid, --gid are unavailable.
it seems that using "mtree" was the preferred way to specify such).
FreeBSD 8.3 and later: --uname, --gname, --uid, --gid are available.
the manual says follows:
--uid id
Use the provided user id number and ignore the user name from the
archive. On create, if --uname is not also specified, the user
name will be set to match the user id.
--uname name
Use the provided user name. On extract, this overrides the user
name in the archive; if the provided user name does not exist on
the system, it will be ignored and the user id (from the archive
or from the --uid option) will be used instead. On create, this
sets the user name that will be stored in the archive; the name
is not verified against the system user database.
Thus, to emulate (post 2012-) bsdtar perfectly, getpwnam(), getpwuid() etc
would be still needed to implement "--uid" X-(.
Tracking the history of bsdtar, maybe I should track the history of GNU
tar. According to ChangeLog, even --owner --group are rather newer option
since 1.13.18 (released on 2000-10-29). The original syntax was like this.
`--owner=USER'
Specifies that `tar' should use USER as the owner of members when
creating archives, instead of the user associated with the source
file. USER is first decoded as a user symbolic name, but if this
interpretation fails, it has to be a decimal numeric user ID.
There is no value indicating a missing number, and `0' usually
means `root'. Some people like to force `0' as the value to offer
in their distributions for the owner of files, because the `root'
user is anonymous anyway, so that might as well be the owner of
anonymous archives.
This option does not affect extraction from archives.
Oh, there is no colon separated syntax! According to ChangeLog, the
introduction of colon separated syntax was on 2011-08-13 and
released as GNU tar-1.27 (2013-10-06).
`--owner=USER'
Specifies that `tar' should use USER as the owner of members when
creating archives, instead of the user associated with the source
file. USER can specify a symbolic name, or a numeric ID, or both
as NAME:ID. *Note override::.
This option does not affect extraction from archives.
Comparing the original --owner and current --owner description, a
strange point is that the original description says "USER is first
decoded as a user symbolic name, but if this interpretation fails,
it has to be a decimal numeric user ID." What? It seems that
"checking whether the specified username is known by the host system
and its numerical uid is resolvable - if unresolvable, try to
parse as decimal value - if failed, take it as fatal error". Here
I quote the related part.
tar-1.14/src/names.c
119 /* Given UNAME, set the corresponding UID and return 1, or else, return
0. */
120 int
121 uname_to_uid (char const *uname, uid_t *uidp)
122 {
123 struct passwd *passwd;
124
125 if (cached_no_such_uname
126 && strcmp (uname, cached_no_such_uname) == 0)
127 return 0;
128
129 if (!cached_uname
130 || uname[0] != cached_uname[0]
131 || strcmp (uname, cached_uname) != 0)
132 {
133 passwd = getpwnam (uname);
134 if (passwd)
135 {
136 cached_uid = passwd->pw_uid;
137 assign_string (&cached_uname, passwd->pw_name);
138 }
139 else
140 {
141 assign_string (&cached_no_such_uname, uname);
142 return 0;
143 }
144 }
145 *uidp = cached_uid;
146 return 1;
147 } 1087 case OWNER_OPTION:
tar-1.14/src/tar.c
1088 if (! (strlen (optarg) < UNAME_FIELD_SIZE
1089 && uname_to_uid (optarg, &owner_option)))
1090 {
1091 uintmax_t u;
1092 if (xstrtoumax (optarg, 0, 10, &u, "") == LONGINT_OK
1093 && u == (uid_t) u)
1094 owner_option = u;
1095 else
1096 FATAL_ERROR ((0, 0, "%s: %s", quotearg_colon (optarg),
1097 _("Invalid owner")));
1098 }
1099 break;
In summary, there are following types.
a) older GNU tar
--owner must match with the host database, no option to set
uname & uid separately.
b) newer GNU tar
--owner accepts unknown username and/or uid.
if only one part is given and known by the host system,
the missing part is deduced by it.
if only one part is given and unknown by the host system,
the missing part is unchanged from the file to be archived.
c) newer bsd tar
--uname/--uid accept unknown username and/or uid.
username is just used to override uname entry of the archive,
but uid is used to override both of uid and uname entries,
if uid is known and username is not specified.
If uid is unknown, uid is overriden, but the username entry
is unchanged from the file to be archived.
which behaviour is to be simulated? I want to propose
yet another one, similar to c) but incompatble.
d) --uname, --uid, --gname, --gid check only the syntax
(to kick the username starting with digit, non-digit
uid, etc) and no check for known/unknown.
+#if ULONG_MAX > 0xFFFFFFFFUL
+ /*
+ * --owner, --group rejects uid/gid greater than 32-bit
+ * limits, even on 64-bit platforms.
+ */
+ if (ul > 0xFFFFFFFFUL)
+ return STR_IS_DIGIT_TOO_LARGE;
+#endif
The #if is not really necessary, is it? Compilers should be able to
optimize the conditional out on 32-bit platforms.
Thanks for finding this, I'm glad to have a chance to ask a
question; git is not needed to care for 16-bit platforms?
+ /* the operand is known to be non-digit */
+
+ args->uname = xstrdup(tar_owner);
+ pw = getpwnam(tar_owner);
How well does this work on Windows? In daemon.c we avoid calling
getpwnam(3), getgrnam(3) etc. if NO_POSIX_GOODIES is not defined.
OK, I can enclose them by ifdefs of NO_POSIX_GOODIES. But,
maybe the design the options would be discussed for first.
Both of latest GNU and BSD tar call getpwnam() or getpwuid(),
but designing as all of --uname --uid --gname --gid as "only syntax
is checked (non-digit UID/GID should be refused), but known/unknown
is not checked" would be the most portable.
GNU tar and bsdtar show the names of owner and group with -t -v at
least, albeit in slightly different formats. Can this help avoid
parsing the archive on our own?
Yeah, writing yet another tar archive parser in C, to avoid the additional
dependency to Python or newer Perl (Archive::Tar since perl-5.10), is
painful, I feel (not only for me but also for the maintainers).
If tar command itself works well, it would be the best.
But, I'm not sure whether the format of "tar tv" output is stably
standardized. It's the reason why I wrote Python tool. If I execute
git-archive with sufficently long randomized username & uid in
several times, it would be good test?
But getting a short program like zipdetails for tar would be nice as
well of course. :)
I wrote something in C:
https://github.com/mpsuzuki/git/blob/pullreq-20171227-c/t/helper/test-parse-tar-file.c
but if somebody wants the support of other tar variants,
he/she would have some headache :-)
Regards,
mpsuzuki