Re: [GSoC][PATCH 2/2] refs: add name and content check for file backend

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 30, 2024 at 08:27:53PM +0800, shejialuo wrote:
> The existing git-fsck does not fully check bad format ref name such as
> standalone '@' or name ending with ".lock". And also `git-fsck` does not
> fully check ref content, the following contents will not be reported by
> the original git-fsck command.
> 
> 1. "c71dba5654404b9b0d378a6cbff1b743b9d31a34 garbage": with trailing
>    characters.
> 2. "c71dba5654404b9b0d378a6cbff1b743b9d31a34    ": with trailing empty
>    characters.
> 3. "C71DBA5654404b9b0d378a6cbff1b743b9d31A34": with uppercase hex.
> 4. "z71dba5654404b9b0d378a6cbff1b743b9d31a34": with not hex character.
> 
> Provide functionality to support such consistency checks for branch and
> tag refs and add a new unit test to verify this functionality.

I would recommend to only introduce one check per commit. This commit
introduces multiple different checks to the files backend, which makes
it harder to review.

[snip]
> +static int files_fsck_refs_content(const char *refs_check_dir,
> +				struct dir_iterator *iter)
> +{
> +	struct strbuf ref_content = STRBUF_INIT;
> +
> +	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
> +		error(_("%s/%s: unable to read the ref"), refs_check_dir, iter->basename);
> +		goto clean;
> +	}
> +
> +	/*
> +	 * Case 1: check if the ref content length is valid and the last
> +	 * character is a newline.
> +	 */
> +	if (ref_content.len != the_hash_algo->hexsz + 1 ||
> +			ref_content.buf[ref_content.len - 1] != '\n') {
> +		goto failure;
> +	}
> +
> +	/*
> +	 * Case 2: the content should be range of [0-9a-f].
> +	 */
> +	for (size_t i = 0; i < the_hash_algo->hexsz; i++) {
> +		if (!isdigit(ref_content.buf[i]) &&
> +				(ref_content.buf[i] < 'a' || ref_content.buf[i] > 'f')) {
> +			goto failure;
> +		}
> +	}
> +
> +	strbuf_release(&ref_content);
> +	return 0;
> +
> +failure:
> +	error(_("%s/%s: invalid ref content"), refs_check_dir, iter->basename);
> +
> +clean:
> +	strbuf_release(&ref_content);
> +	return -1;

Can we reuse `parse_loose_ref_contents()` to get rid of much of the
duplication here? That function would need to learn to indicate to the
caller that there is trailing data, but other than that it's already as
strict as we want it to be.

Other than that, Junio has already added quite a bunch of comments, so
I'll refrain from repeating them here.

Thanks!

Patrick

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux