Am 01.05.19 um 20:09 schrieb Jeff King: > On Mon, Apr 29, 2019 at 05:32:50PM -0400, Johannes Schindelin wrote: > >>> Another is that I am not sure how your "fixed format" argument >>> meshes with the "-b blocksize" parameter to affect the tar/pax >>> output. The format may be fixed, but it is parameterized. If >>> we ever need to grow the ability to take "-b", having the knowledge >>> that our current code is limited to the fixed BLOCKSIZE in a single >>> function (i.e. the caller of this function , not the callee) would >>> be less error prone. >> >> This argument would hold a lot more water if the following lines were not >> part of archive-tar.c: >> >> #define RECORDSIZE (512) >> #define BLOCKSIZE (RECORDSIZE * 20) >> >> static char block[BLOCKSIZE]; >> >> If you can tell me how the `-b` (run-time) parameter can affect the >> (compile-time) `BLOCKSIZE` constant, maybe I can start to understand your >> concern. > > FWIW, I agree with you here. These patches are not making anything worse > (and may even make them better, since we'd probably need to swap out the > BLOCKSIZE constant for a run-time "blocksize" variable in fewer places). The block size is mostly relevant for writing tar archives to magnetic tapes. You can do that with git archive and a tape drive that supports the blocking factor 20, which is the default for GNU tar and thus should be quite common. You may get higher performance with a higher blocking factor, if supported. But so far this didn't come up on the mailing list, and I'd be surprised if people really wrote snapshots of git archives directly to tape. So I'm not too worried about this define ever becoming a user-settable option. Sealing the constant into a function a bit feels dirty, though. Mixing code and data makes the code more brittle. Another example of that is the hard-coded file descriptor in the same function, by the way. It's a lot of busywork to undo in order to gain the ability to write to some other fd, for the questionable convenience of not having to pass that parameter along the call chain. My bad. But anyway, I worry more about the fact that blocking is not needed when gzip'ing; gzwrite can be fed pieces of any size, not just 20 KB chunks. The tar writer just needs to round up the archive size to a multiple of 20 KB and pad with NUL bytes at the end, in order to produce the same uncompressed output as non-compressing tar. If we'd wanted to be tape-friendly, then we'd have to block the gzip'ed output instead of the uncompressed tar file, but I'm not suggesting doing that. Note to self: I wonder if moving the blocking part out into an asynchronous function could simplify the code. René