Re: [PATCH v2 3/4] ref: add symbolic ref content check for files backend

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 27, 2024 at 12:19:11PM -0700, Junio C Hamano wrote:
> shejialuo <shejialuo@xxxxxxxxx> writes:
> 
> > In order to check the content of the symbolic ref, create a function
> > "files_fsck_symref_target". It will first check whether the "pointee" is
> > under the "refs/" directory and then we will check the "pointee" itself.
> 
> Hmph, as the pointee must be within the usual places that you would
> find refs (either in refs/ directory or pseudo ref files immediately
> below $GIT_DIR), wouldn't we check the pointee when fsck (or "git
> refs verify") run and check everything?  The pointee will have its
> turn to be checked, and I am not sure why you need to check the
> pointee when you find a symbolic ref is pointing at it, which will
> lead for it to be checked twice (or more).
> 
> I however did not find an additional code to "check the pointee itself"
> in the patch, so perhaps it is OK---the only thing that needs fixing
> may be the above paragraph if that is the case.
> 

Yes, "we will check the 'pointee'" itself makes the reader confused. I
will fix the above paragraph. Actually we do not check the "pointee",
but check the symref content. Will fix this in the next version.

> > There is no specification about the content of the symbolic ref.
> > Although we do write "ref: %s\n" to create a symbolic ref by using
> > "git-symbolic-ref(1)" command. However, this is not mandatory. We still
> > accept symbolic refs with null trailing garbage. Put it more specific,
> > the following are correct:
> >
> > 1. "ref: refs/heads/master   "
> > 2. "ref: refs/heads/master   \n  \n"
> > 3. "ref: refs/heads/master\n\n"
> >
> > But we do not allow any non-null trailing garbage.
> 
> Your use of word "null" is probably too confusing to contributors to
> this project.  None of the above has NUL bytes in them.  I think you
> want to say something like this:
> 
>     A regular file is accepted as a textual symbolic ref if it
>     begins with "ref:", followed by zero or more whitespaces,
>     followed by the full refname (e.g. "refs/heads/master",
>     "refs/tags/v1.0"), followed only by whitespace characters.  We
>     always write a single SP after "ref:" and a single LF after the
>     full refname, but third-party reimplementations of Git may have
>     taken advantage of the looser syntax that is allowed as above.
> 

Thanks for your suggestion. I will improve this in the next version.

> > The following are bad
> > symbolic contents which will be reported as fsck error by "git-fsck(1)".
> >
> > 1. "ref: refs/heads/master garbage\n"
> > 2. "ref: refs/heads/master \n\n\n garbage  "
> >
> > In order to provide above checks, we will use "strrchr" to check whether
> > we have newline in the ref content.
> 
> strrchr() to look for only LF is overly strict.  You need to match
> what refs/files-backend.c:read_ref_internal() does to the contents
> read from such a loose ref file, i.e. strbuf_rtrim().  Any isspace()
> bytes are trimmed at the end, including SP, HT, CR and LF.
> 

I will look into how "strbuf_rtrim" does to see whether we can reuse
some functions to avoid repetition.

> > +static int files_fsck_symref_target(struct fsck_options *o,
> > +				    struct fsck_ref_report *report,
> > +				    const char *refname,
> > +				    struct strbuf *pointee_name,
> > +				    struct strbuf *pointee_path)
> > +{
> > +	const char *newline_pos = NULL;
> > +	const char *p = NULL;
> > +	struct stat st;
> > +	int ret = 0;
> > +
> > +	if (!skip_prefix(pointee_name->buf, "refs/", &p)) {
> > +
> > +		ret = fsck_report_ref(o, report,
> > +				      FSCK_MSG_BAD_SYMREF_POINTEE,
> > +				      "points to ref outside the refs directory");
> > +		goto out;
> > +	}
> > +
> > +	newline_pos = strrchr(p, '\n');
> > +	if (!newline_pos || *(newline_pos + 1)) {
> > +		ret = fsck_report_ref(o, report,
> > +				      FSCK_MSG_REF_MISSING_NEWLINE,
> > +				      "missing newline");
> 
> If newline_pos is NULL, it is truly a "missing newline" situation.
> If I am reading the code correctly, the severity level is set to
> INFO, which is good.
> 
> If newline_pos is not NULL but newline_pos[1] is not NUL, however,
> that is not a "missing newline".  "refs: refs/heads/master\n " would
> trigger this report, for example.
> 

When I design this, I actually consider "ref: refs/heads/master\n " is
still missing the newline. And then we also report that it has garbage.
I think "ref: refs/heads/master\n \n" is not missing the newline. But, I
don't think this is good.

I will find a good way to handle this.

> As far as I can tell, such a textual symbolic ref is taken as a
> valid symbolic ref pointing at "refs/heads/master" by
> refs/files-backend.c:read_ref_internal(), so we are trying to detect
> a valid but curiously formatted textual symbolic ref file with the
> above code?

Yes, these situations will be taken as a valid symbolic ref but actually
there are something wrong. So this is what we need to care about.

> 
> And strrchr() to find the last LF is not sufficient for that
> purpose.  We would never write "refs:  refs/head/master \n",
> but the above code will find the LF, be satisified that the LF is
> followed by NUL, without realizing that SP there is not something we
> would have written!

I totally ignored this situation, and in current patch, we cannot check
this. I know why Patrick lets me use "strchr" but not "strrchr". I think
we should find the last '\n'. But instead we need to find the first
'\n'. However, in this example, we will still fail by using "strchr".
This part should be totally re-designed.

> 
> I am not sure if that is worth detecting that if it is something we
> would have written, but if that were the case, then you would
> probably need to do
> 
>     (1) check the last byte of pointee_name.buf[] to make sure that
>         it is LF; and
>     (2) remember pointee_name.len, run strbuf_rtrim() on pointee_name,
>         and that LF at the end was the only thing that was trimmed by
>         checking the pointee_name.len after trimming.
> 
> or something like that.  Then you do not have to have an ugly "oh we
> need to check again"---the production code would not do that, either.
> 

Yes, this is a good idea.

> > +	if (check_refname_format(pointee_name->buf, 0)) {
> > +		/*
> > +		 * When containing null-garbage, "check_refname_format" will
> > +		 * fail, we should trim the "pointee" to check again.
> > +		 */
> > +		strbuf_rtrim(pointee_name);
> > +		if (!check_refname_format(pointee_name->buf, 0)) {
> > +			ret = fsck_report_ref(o, report,
> > +					      FSCK_MSG_TRAILING_REF_CONTENT,
> > +					      "trailing null-garbage");
> > +			goto out;
> > +		}
> 
> IOW, the above "let's retry" feels totally wrong.  You shouldn't
> have to do so, and that comes from running check_refname_format()
> before rtrimming the pointee_name.
> 

Yes, actually, I have thought I could compare the length change after
executing the "strbuf_rtrim". I don't want to create two new variables,
so I call "check_refname_format" twice.

Will fix this in the next version.

> > +	if (!S_ISREG(st.st_mode) && !S_ISLNK(st.st_mode)) {
> > +		ret = fsck_report_ref(o, report,
> > +				      FSCK_MSG_BAD_SYMREF_POINTEE,
> > +				      "points to an invalid file type");
> > +		goto out;
> 
> I do not think it is wrong per se, but I am not sure if this check
> is needed, either.  When "git fsck" or "git refs verify" is told to
> check the loose refs, wouldn't it walk the refs directory and report
> such an unusual filesystem entity that is not a regular file,
> symbolic link, or a directory as "there is unusual cruft exist
> here"?

When setting up the infrastructure, actually we DO report filesystem
entity that is not a regular file or symbolic link like the following:

    if (S_ISDIR(iter->st.st_mode)) {
        continue;
    } else if (S_ISREG(iter->st.st_mode) ||
               S_ISLNK(iter->st.st_mode)) {
        ...;
    } else {
      // report file system error
    }

We do not check the directory, because the directory will be always
valid in the filesystem. we could not say that

  "refs/heads/a/" is a bad ref.

So, this check mainly need to check whether the symref points to a
directory. Actually, Patrick has also gave the review about this
question in the previous version:

> What exactly are we guarding against here? Don't we already verify that
> files in `refs/` have the correct type? Or are we checking that it does
> not point to a directory?

However, we should remove this line, because "check_refname_format" will
take care for us.

  git check-ref-format 'refs/heads/'

It will generate an error. So, we could entirely remove this line and
let "check_refname_format" do this. And we could also remove the

    if (lstat(pointee_path->buf, &st) < 0)
        goto out;

The code will be much more clean.

Thanks,
Jialuo




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux