On Thu, Jun 13, 2024 at 12:38:45PM -0700, Junio C Hamano wrote: > shejialuo <shejialuo@xxxxxxxxx> writes: > > > In order to check the trailing content, add a new parameter > > "trailing" to "parse_loose_ref_contents" function. > > About this one. > > > int parse_loose_ref_contents(const char *buf, struct object_id *oid, > > struct strbuf *referent, unsigned int *type, > > - int *failure_errno) > > + int *failure_errno, unsigned int *trailing) > > { > > const char *p; > > if (skip_prefix(buf, "ref:", &buf)) { > > @@ -607,6 +607,10 @@ int parse_loose_ref_contents(const char *buf, struct object_id *oid, > > *failure_errno = EINVAL; > > return -1; > > } > > + > > + if (trailing && (*p != '\0' && *p != '\n')) > > + *trailing = 1; > > + > > return 0; > > } > > We know what the garbage looked like at this point. The caller owns > the "buf" pointer and we are pointing into that buffer with the > pointer p, and the garbage is right there. > > So I am not sure if losing information by using "uint *" is a good > idea. Wouldn't it make more sense to take "const char **trailing" > as a parameter and tell the caller where the trailing junk begins? > Yes, I totally agree that using "uint *" will lose a lot of information here. Actually I have used the "const char **trailing", but I made a mistake to result the wild pointer. This is because when "parse_loose_ref_contents" handles symref, it will never handle `*p`. When the caller defines `const char *trailing`, it will be wild pointer. But I think we could set it to `NULL` when handling symref. I will change the code in the next version. > > +static int files_fsck_symref(struct fsck_refs_options *o, > > + struct strbuf *refname, > > + struct strbuf *path) > > This does not take things like HEAD or refs/remotes/origin/HEAD to > validate. Instead, the caller is responsible for either doing a > readlink on a symbolic link, or reading a textual symref and > stripping "ref: " prefix from it, before calling this function. > The "refname" parameter is not HEAD or refs/remotes/origin/HEAD but > the pointee of the symref. > > So I'd imagine that renaming it to fsck_symref_target or along that > line to clarify that we are not checking the symref, but the target > of a symref, would be a good idea. > That's not correct. The "refname" parameter is EXACTLY the symref itself. What we do here is to check the "path" paramteter, there are two situations: 1. For symref we will strip "ref: " prefix, and combine the girdir and the stripped content to get the "path" parameter. 2. For symbolic we will get its actual path (here I made a mistake, I totally forget the situation when it points to absolute path, I will revise the code for handling it). The design here is just check whether the symref points to the correct thing. It does not care about the pointee. The code will traverse every regular file under the "refs/" directory, eventually we will check the "pointee" status. For example, a symref "sym-branch" and a regular ref "branch". sym-branch: "ref: refs/heads/branch". branch: "xxxx" The design will not report any error for "sym-branch". I think we should discuss here whether this design is OK. > > +{ > > + struct stat st; > > + int ret = 0; > > + > > + if (lstat(path->buf, &st) < 0) { > > + ret = fsck_refs_report(o, refname->buf, > > + FSCK_MSG_DANGLING_SYMREF, > > + "point to non-existent ref"); > > + goto out; > > + } > > Is that an error? Just like being on an unborn branch is not an > error, it could be argued that a symref that points at a branch yet > to be born wouldn't be an error, either, no? > The reason why I choose "danglingSymref" and warn severity is that I let the code be align with "git checkout". When we use "git checkout" for a dangling symref. It would produce the following output: $ git checkout branch-3 warning: ignoring dangling symref refs/heads/branch-3 error: pathspec 'branch-3' did not match any file(s) known to git So I prefer to warn severity. > > + if (!S_ISREG(st.st_mode) && !S_ISLNK(st.st_mode)) { > > + ret = fsck_refs_report(o, refname->buf, > > + FSCK_MSG_DANGLING_SYMREF, > > + "point to invalid object"); > > + goto out; > > The use of "object" here is highly misleading. Yes, you can call a > filesystem entity like "a regular file", "a directory", etc. "an > object", but the word can refer to many other kinds of "object". In > fact, I originally read this to mean "we are referring to an object > in the object database that is corrupt" or something, but of course > that is not what we are complaining about. We are complaining that > the symbolic link points at a file of wrong type (like a directory). > Yes, it brings a lot of misleading here. I will clean the code and commit message (I also used object in commit message). > So, in short, missing is probably OK. Pointing at a wrong thing > (like a directory or block device) is not. Where, if any, do we > catch a symbolic ref that tries to escape the refs/* hierarchy > (e.g. ".git/refs/heads/my-crazy-ref" that is a symbolic link that > points at "../../../../else/where" that is not even part of the > repository), by the way? > I intentionally ignored the "escape" situation. Actually, the path could be either absolute or relative. It may be a little complicated. I will find a way to support this in the next version. > Thanks. Thanks, Jialuo