Re: [GSoC][PATCH v2 7/7] fsck: add ref content check for files backend

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 13, 2024 at 12:38:45PM -0700, Junio C Hamano wrote:
> shejialuo <shejialuo@xxxxxxxxx> writes:
> 
> > In order to check the trailing content, add a new parameter
> > "trailing" to "parse_loose_ref_contents" function.
> 
> About this one.
> 
> >  int parse_loose_ref_contents(const char *buf, struct object_id *oid,
> >  			     struct strbuf *referent, unsigned int *type,
> > -			     int *failure_errno)
> > +			     int *failure_errno, unsigned int *trailing)
> >  {
> >  	const char *p;
> >  	if (skip_prefix(buf, "ref:", &buf)) {
> > @@ -607,6 +607,10 @@ int parse_loose_ref_contents(const char *buf, struct object_id *oid,
> >  		*failure_errno = EINVAL;
> >  		return -1;
> >  	}
> > +
> > +	if (trailing && (*p != '\0' && *p != '\n'))
> > +		*trailing = 1;
> > +
> >  	return 0;
> >  }
> 
> We know what the garbage looked like at this point.  The caller owns
> the "buf" pointer and we are pointing into that buffer with the
> pointer p, and the garbage is right there.
> 
> So I am not sure if losing information by using "uint *" is a good
> idea.  Wouldn't it make more sense to take "const char **trailing"
> as a parameter and tell the caller where the trailing junk begins?
> 

Yes, I totally agree that using "uint *" will lose a lot of information
here. Actually I have used the "const char **trailing", but I made a
mistake to result the wild pointer. This is because when
"parse_loose_ref_contents" handles symref, it will never handle `*p`.
When the caller defines `const char *trailing`, it will be wild pointer.
But I think we could set it to `NULL` when handling symref.

I will change the code in the next version.

> > +static int files_fsck_symref(struct fsck_refs_options *o,
> > +			     struct strbuf *refname,
> > +			     struct strbuf *path)
> 
> This does not take things like HEAD or refs/remotes/origin/HEAD to
> validate.  Instead, the caller is responsible for either doing a
> readlink on a symbolic link, or reading a textual symref and
> stripping "ref: " prefix from it, before calling this function.
> The "refname" parameter is not HEAD or refs/remotes/origin/HEAD but
> the pointee of the symref.
> 
> So I'd imagine that renaming it to fsck_symref_target or along that
> line to clarify that we are not checking the symref, but the target
> of a symref, would be a good idea.
> 

That's not correct. The "refname" parameter is EXACTLY the symref
itself. What we do here is to check the "path" paramteter, there are two
situations:

1. For symref we will strip "ref: " prefix, and combine the girdir and
the stripped content to get the "path" parameter.
2. For symbolic we will get its actual path (here I made a mistake, I
totally forget the situation when it points to absolute path, I will
revise the code for handling it).

The design here is just check whether the symref points to the correct
thing. It does not care about the pointee. The code will traverse every
regular file under the "refs/" directory, eventually we will check the
"pointee" status. For example, a symref "sym-branch" and a regular ref
"branch".

  sym-branch: "ref: refs/heads/branch".
  branch: "xxxx"

The design will not report any error for "sym-branch". I think we should
discuss here whether this design is OK.

> > +{
> > +	struct stat st;
> > +	int ret = 0;
> > +
> > +	if (lstat(path->buf, &st) < 0) {
> > +		ret = fsck_refs_report(o, refname->buf,
> > +				       FSCK_MSG_DANGLING_SYMREF,
> > +				       "point to non-existent ref");
> > +		goto out;
> > +	}
> 
> Is that an error?  Just like being on an unborn branch is not an
> error, it could be argued that a symref that points at a branch yet
> to be born wouldn't be an error, either, no?
> 

The reason why I choose "danglingSymref" and warn severity is that I let
the code be align with "git checkout". When we use "git checkout" for a
dangling symref. It would produce the following output:

  $ git checkout branch-3
  warning: ignoring dangling symref refs/heads/branch-3
  error: pathspec 'branch-3' did not match any file(s) known to git

So I prefer to warn severity.

> > +	if (!S_ISREG(st.st_mode) && !S_ISLNK(st.st_mode)) {
> > +		ret = fsck_refs_report(o, refname->buf,
> > +				       FSCK_MSG_DANGLING_SYMREF,
> > +				       "point to invalid object");
> > +		goto out;
> 
> The use of "object" here is highly misleading.  Yes, you can call a
> filesystem entity like "a regular file", "a directory", etc. "an
> object", but the word can refer to many other kinds of "object".  In
> fact, I originally read this to mean "we are referring to an object
> in the object database that is corrupt" or something, but of course
> that is not what we are complaining about. We are complaining that
> the symbolic link points at a file of wrong type (like a directory).
> 

Yes, it brings a lot of misleading here. I will clean the code and
commit message (I also used object in commit message).

> So, in short, missing is probably OK.  Pointing at a wrong thing
> (like a directory or block device) is not.  Where, if any, do we
> catch a symbolic ref that tries to escape the refs/* hierarchy
> (e.g. ".git/refs/heads/my-crazy-ref" that is a symbolic link that
> points at "../../../../else/where" that is not even part of the
> repository), by the way?
> 

I intentionally ignored the "escape" situation. Actually, the path could
be either absolute or relative. It may be a little complicated. I will
find a way to support this in the next version.

> Thanks.

Thanks,
Jialuo




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux