Re: [PATCH 1/2] archive: replace write_or_die() calls with write_block_or_die()

René Scharfe <l.s.r@xxxxxx> · Thu, 2 May 2019 22:29:55 +0200

Am 01.05.19 um 20:09 schrieb Jeff King:
> On Mon, Apr 29, 2019 at 05:32:50PM -0400, Johannes Schindelin wrote:
>
>>> Another is that I am not sure how your "fixed format" argument
>>> meshes with the "-b blocksize" parameter to affect the tar/pax
>>> output.  The format may be fixed, but it is parameterized.  If
>>> we ever need to grow the ability to take "-b", having the knowledge
>>> that our current code is limited to the fixed BLOCKSIZE in a single
>>> function (i.e. the caller of this function , not the callee) would
>>> be less error prone.
>>
>> This argument would hold a lot more water if the following lines were not
>> part of archive-tar.c:
>>
>> 	#define RECORDSIZE      (512)
>> 	#define BLOCKSIZE       (RECORDSIZE * 20)
>>
>> 	static char block[BLOCKSIZE];
>>
>> If you can tell me how the `-b` (run-time) parameter can affect the
>> (compile-time) `BLOCKSIZE` constant, maybe I can start to understand your
>> concern.
>
> FWIW, I agree with you here. These patches are not making anything worse
> (and may even make them better, since we'd probably need to swap out the
> BLOCKSIZE constant for a run-time "blocksize" variable in fewer places).

The block size is mostly relevant for writing tar archives to magnetic
tapes.  You can do that with git archive and a tape drive that supports
the blocking factor 20, which is the default for GNU tar and thus should
be quite common.  You may get higher performance with a higher blocking
factor, if supported.

But so far this didn't come up on the mailing list, and I'd be surprised
if people really wrote snapshots of git archives directly to tape.  So
I'm not too worried about this define ever becoming a user-settable
option.  Sealing the constant into a function a bit feels dirty, though.
Mixing code and data makes the code more brittle.

Another example of that is the hard-coded file descriptor in the same
function, by the way.  It's a lot of busywork to undo in order to gain
the ability to write to some other fd, for the questionable convenience
of not having to pass that parameter along the call chain.  My bad.

But anyway, I worry more about the fact that blocking is not needed when
gzip'ing; gzwrite can be fed pieces of any size, not just 20 KB chunks.
The tar writer just needs to round up the archive size to a multiple of
20 KB and pad with NUL bytes at the end, in order to produce the same
uncompressed output as non-compressing tar.

If we'd wanted to be tape-friendly, then we'd have to block the gzip'ed
output instead of the uncompressed tar file, but I'm not suggesting
doing that.

Note to self: I wonder if moving the blocking part out into an
asynchronous function could simplify the code.

René