Re: [GSoC][PATCH v13 10/10] fsck: add ref content check for files backend

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 29, 2024 at 09:27:56PM +0800, shejialuo wrote:
> Enhance the git-fsck(1) command by adding a check for reference content
> in the files backend. The new functionality ensures that symrefs, real
> symbolic link and regular refs are validated correctly.
> 
> In order to check the trailing content of the regular refs, add a new
> parameter `trailing` to `parse_loose_ref_contents`.
> 
> For symrefs, `parse_loose_ref_contents` will set the "referent".
> However, symbolic link could be either absolute or relative. Use
> "strbuf_add_real_path" to read the symbolic link and convert the
> relative path to absolute path. Then use "skip_prefix" to make it align
> with symref "referent".
> 
> Thus, the symrefs and symbolic links could share the same interface. Add
> a new function "files_fsck_symref_target" which aims at checking the
> following things:
> 
> 1. whether the pointee is under the `refs/` directory.
> 2. whether the pointee name is correct.
> 3. whether the pointee path is a wrong type in filesystem.
> 
> Last, add the following FSCK MESSAGEs:
> 
> 1. "badRefContent(ERROR)": A ref has a bad content
> 2. "badSymrefPointee(ERROR)": The pointee of a symref is bad.
> 3. "trailingRefContent(WARN)": A ref content has trailing contents.

I think it would have been fine to stop at the preceding commit as it
clearly demonstrates how the whole infrastructure is supposed to work.
Additional checks like those you add here would then be a good candidate
for a separate patch series. This would help you get the first patch
series landed faster as you really only have to focus on setting up the
baseline infrastructure.

Feel free to keep or drop this patch as you prefer though, I don't want
to discourage you aiming higher. Just keep in mind that the more you add
on top the longer it takes to land a patch series :)

> diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
> index d8e437a043..8fe24a960e 100644
> --- a/Documentation/fsck-msgids.txt
> +++ b/Documentation/fsck-msgids.txt
> @@ -19,9 +19,15 @@
>  `badParentSha1`::
>  	(ERROR) A commit object has a bad parent sha1.
>  
> +`badRefContent`::
> +	(ERROR) A ref has a bad content.
> +

s/a bad/bad

> +static int files_fsck_refs_content(struct fsck_options *o,
> +				   const char *gitdir,
> +				   const char *refs_check_dir,
> +				   struct dir_iterator *iter)
> +{
> +	struct strbuf pointee_path = STRBUF_INIT,
> +		      ref_content = STRBUF_INIT,
> +		      abs_gitdir = STRBUF_INIT,
> +		      referent = STRBUF_INIT,
> +		      refname = STRBUF_INIT;

Nit: I think it's more customary to start each of the lines with `struct
strbuf`. Not a 100% certain on this one, though.

> +	const char *trailing = NULL;
> +	struct fsck_refs_info info;
> +	int failure_errno = 0;
> +	unsigned int type = 0;
> +	struct object_id oid;
> +	int ret = 0;
> +
> +	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
> +	info.path = refname.buf;
> +
> +	/*
> +	 * If the file is a symlink, we need to only check the connectivity
> +	 * of the destination object.
> +	 */
> +	if (S_ISLNK(iter->st.st_mode)) {
> +		const char *pointee_name = NULL;
> +
> +		strbuf_add_real_path(&pointee_path, iter->path.buf);
> +
> +		strbuf_add_absolute_path(&abs_gitdir, gitdir);
> +		strbuf_normalize_path(&abs_gitdir);
> +		if (!is_dir_sep(abs_gitdir.buf[abs_gitdir.len - 1]))
> +			strbuf_addch(&abs_gitdir, '/');
> +
> +		if (!skip_prefix(pointee_path.buf,
> +				 abs_gitdir.buf, &pointee_name)) {
> +			ret = fsck_refs_report(o, NULL, &info,
> +					       FSCK_MSG_BAD_SYMREF_POINTEE,
> +					       "point to target outside gitdir");
> +			goto clean;
> +		}
> +
> +		ret = files_fsck_symref_target(o, &info, refname.buf,
> +					       pointee_name, pointee_path.buf);
> +		goto clean;
> +	}
> +
> +	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
> +		ret = error_errno(_("%s/%s: unable to read the ref"),
> +				  refs_check_dir, iter->relative_path);
> +		goto clean;
> +	}
> +
> +	if (parse_loose_ref_contents(ref_content.buf, &oid,
> +				     &referent, &type,
> +				     &failure_errno, &trailing)) {
> +		ret = fsck_refs_report(o, NULL, &info,
> +				       FSCK_MSG_BAD_REF_CONTENT,
> +				       "invalid ref content");
> +		goto clean;
> +	}
> +
> +	/*
> +	 * If the ref is a symref, we need to check the destination name and
> +	 * connectivity.
> +	 */
> +	if (referent.len && (type & REF_ISSYMREF)) {
> +		strbuf_addf(&pointee_path, "%s/%s", gitdir, referent.buf);
> +		strbuf_rtrim(&referent);
> +
> +		ret = files_fsck_symref_target(o, &info, refname.buf,
> +					       referent.buf, pointee_path.buf);
> +		goto clean;
> +	} else {
> +		if (trailing && (*trailing != '\0' && *trailing != '\n')) {

In case the ref ends with a newline, should we check that the next
character is `\0`? Otherwise, it may contain multiple lines, which is
not allowed for a normal ref.

Also, shouldn't the ref always end with a newline?

Patrick

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux