Re: [PATCH v9 09/10] Add reftable library

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Wed, 22 Apr 2020 19:35:46 +0200 (CEST)

Hi,

On Mon, 20 Apr 2020, Junio C Hamano wrote:

> "Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
>
> > diff --git a/reftable/update.sh b/reftable/update.sh
> > new file mode 100644
>
> Shouldn't this be executable, even though the readme tells the
> reader to run "sh update.sh"?

Do we really want to have such a shell script? To be honest, I'd rather do
a subtree merge. In my mind, this should make it easier to maintain this
code elsewhere, yet debug and fix it in git.git (because there is no doubt
whatsoever in my mind that the vast majority of bug fixes in this code
will come in via git.git, which _does_ make me wonder whether it makes
sense to keep the reftable library separate, as if it was truly an
independent part of Git: it is absolutely not).

Of course, this would make it even more desirable to do away with code
that reads like Java (and is probably a silent, defiant protest against
Git's requirement to have no declarations after statements), e.g.

	int block_iter_next(struct block_iter *it, struct record rec)
	{
		if (it->next_off >= it->br->block_len) {
			return 1;
		}

		{
			struct slice in = {
				.buf = it->br->block.data + it->next_off,
				.len = it->br->block_len - it->next_off,
			};
			struct slice start = in;
			struct slice key = { 0 };
			byte extra;
			int n = decode_key(&key, &extra, it->last_key, in);
			if (n < 0) {
				return -1;
			}

			[...]
		}
	}

The fact that this code uses a coding style that is very different from
Git's, on purpose, makes it quite awkward to even so much as debug, say,
the segmentation fault that was caused by my rewrite of the test script
(which _also_ looks _very_ different from the existing test cases, adding
to the awkwardness):

	test_expect_success 'basic operation of reftable storage' '
		rm -rf .git &&
		git init --ref-storage=reftable &&
		mv .git/hooks .git/hooks-disabled &&
		test_commit file &&
		test_write_lines HEAD refs/heads/master refs/tags/file >expect &&
		git show-ref &&
		git show-ref | cut -f2 -d" " > actual &&
		test_cmp actual expect &&
		for count in $(test_seq 1 10)
		do
			test_commit "number $count" file.t $count number-$count ||
			return 1
		done &&
		git gc &&
		ls -1 .git/reftable >table-files &&
		test_line_count = 2 table-files &&
		git reflog refs/heads/master >output &&
		test_line_count = 11 output &&
		grep "commit (initial): first post" output &&
		grep "commit: number 10" output
	'

The fact that this rather trivial usage can still cause a segmentation
fault makes me believe that this rationale for the purposefully-different
coding style implies a fundamental fallacy (see
https://github.com/git-for-windows/git/commit/00eb1fd4496ef763f840878fb5249507ae83af43#commitcomment-38614650):

	Each time you use an unitialized variable, you put another mental burden
	on the reader to check that it is actually OK in that place, and the Git
	source code (which puts the declaration and use) far apart makes this
	harder to verify.

The major challenge with writing C code these days, especially when you
come from a background of Java or other languages that take care of memory
management "for you" is that you _do_ have to manage memory in C,
explicitly. And if you already start not paying attention to the
initialization of local variables, you go down the path where all of a
sudden you end up with a segmentation fault and have no idea where it is
coming from, and the most likely explanation is that the memory was
managed incorrectly.

So I would recommend to actually not get sloppy with confusing declaration
and initialization and initializing unnecessarily and trying to get cute
with introducing code blocks "just to introduce a narrower scope for local
variables".

Instead, I would suggest to stick with the style Git developed; We have
gotten relatively good with keeping our memory management straight that
way.

Ciao,
Dscho