Re: [PATCH v4 4/5] ref: add symref content check for files backend

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



shejialuo <shejialuo@xxxxxxxxx> writes:

Expect that people do not read the body of the message as completing
a paragrpah the title started.  I.e. ...

> We have already introduced the checks for regular refs. There is no need
> to check the consistency of the target which the symref points to.
> Instead, we just need to check the content of the symref itself.

... this needs a bit of preamble, like

    We have code that check regular ref contents, but we do not yet
    check contents of symbolic refs.

> A regular file is accepted as a textual symref if it begins with
> "ref:", followed by zero or more whitespaces, followed by the full
> refname, followed only by whitespace characters. We always write
> a single SP after "ref:" and a single LF after the refname, but
> third-party reimplementations of Git may have taken advantage of the
> looser syntax. Put it more specific, we accept the following contents
> of the symref:
>
> 1. "ref: refs/heads/master   "
> 2. "ref: refs/heads/master   \n  \n"
> 3. "ref: refs/heads/master\n\n"
>
> Thus, we could reuse "refMissingNewline" and "trailingRefContent"
> FSCK_INFOs to do the same retroactive tightening as we introduce for
> regular references.
>
> But we do not allow any other trailing garbage. The followings are bad
> symref contents which will be reported as fsck error by "git-fsck(1)".

This description needs to be updated, as it is unclear if you are
talking about errors we already detect, or if you are planning to
update fsck to notice and report these errors.

> 1. "ref: refs/heads/master garbage\n"
> 2. "ref: refs/heads/master \n\n\n garbage  "
>
> And we introduce a new "badReferentName(ERROR)" fsck message to report
> above errors to the user.

OK.

> In order to check the content of the symref, create a function
> "files_fsck_symref_target". It will first check whether the "referent"
> is under the "refs/" directory, if not, we will report "escapeReferent"
> fsck error message to notify the user this situation.
>
> Then, we will first check whether the symref content misses the newline
> by peeking the last byte of the "referent" to see whether it is '\n'.

"Then, we will first" -> "Then it checks" or something.

You already consumed "first" for the check to limit the referent to
those under "refs/" hierarchy.

> And we will remember the untrimmed length of the "referent" and call
> "strbuf_rtrim()" on "referent". Then, we will call "check_refname_format"
> to check whether the trimmed referent format is valid. If not, we will
> report to the user that the symref points to referent which has invalid
> format. If it is valid, we will compare the untrimmed length and trimmed
> length, if they are not the same, we need to warn the user there is some
> trailing garbage in the symref content.

That is an implementation detail of what you did.  But if the
implementation were buggy and did not exactly what you intended to
do, the above description gives no information to help others to fix
it up so that it works as you intended it to work, because you do
not explain it.

So what did you want to achieve in the third step (the first being
"limit to refs/ hiararchy", the second being "no incomplete lines
allowed")?

    Third, we want to make sure that the contents of a textual
    symref MUST have a single LF after the target refname and
    NOTHING ELSE.

or something.

> At last, we need to check whether the referent is a directory. We cannot

"a directory" -> "an existing directory"?

I am not comfortable to see the word "directory" used in this
proposed log message, as some refs could be stored in the packed
backend and are referenced by the symbolic ref you are inspecting
(this comment also refers to the "refs/ directory" you mentioned
earlier as "the first check").

    Lastly, a symbolic ref MUST either point to an existing ref,
    or if the referent does not exist, it MUST NOT be a leading
    subpath for another existing ref (e.g., when "refs/heads/main"
    exists, a symbolic ref that points at "refs/heads" is a no-no).

or something (but again, I am open to a phrasing better than
"subpath").

Design question.  What do we want to do when we have no loose refs
under the "refs/heads/historical/" hiearchy, (i.e. all of them are
in packed-refs file) hence ".git/refs/heads/historical" directory
does not exist on the filesystem.  And a symbolic ref points at
"refs/heads/historical".  Shouldn't we give the same error whether
the .git/refs/heads/historical directory exist or not, as long as
the refs/heads/historical/main branch exists (in the packed-refs
backend)?

> diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
> index 8827137ef0..03bcb77972 100644
> --- a/Documentation/fsck-msgids.txt
> +++ b/Documentation/fsck-msgids.txt
> @@ -28,6 +28,12 @@
>  `badRefName`::
>  	(ERROR) A ref has an invalid format.
>  
> +`badReferentFiletype`::
> +	(ERROR) The referent of a symref has a bad file type.
> +
> +`badReferentName`::
> +	(ERROR) The referent name of a symref is invalid.
> +
>  `badTagName`::
>  	(INFO) A tag has an invalid format.
>  
> @@ -49,6 +55,9 @@
>  `emptyName`::
>  	(WARN) A path contains an empty name.
>  
> +`escapeReferent`::
> +	(ERROR) The referent of a symref is outside the "ref" directory.

I am not sure starting this as ERROR is wise.  Users and third-party
tools make creative uses of the system and I cannot offhand think of
an argument why it should be forbidden to create a symbolic link to
our own HEAD or to some worktree-specific ref in another worktree.

> +	size_t len = referent->len - 1;
> +	struct stat st;
> +	int ret = 0;
> +
> +	if (!starts_with(referent->buf, "refs/")) {
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_ESCAPE_REFERENT,
> +				      "points to ref outside the refs directory");
> +		goto out;
> +	}
> +
> +	if (referent->buf[referent->len - 1] != '\n') {

As you initialized "len" to "referent->len-1" earlier, wouldn't it
more natural to use it here?  That would match the incrementing of
len++ later in this block.

> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_REF_MISSING_NEWLINE,
> +				      "missing newline");
> +		len++;
> +	}

Having said that, the above should be simplified more like:

 * declare but not initialize "len".  better yet, declare "orig_len"
   and leave it uninitialized.

 * do not touch "len++" in the above block (actually, you can
   discard the above "if(it does not end with LF)" block, see
   below).

 * instead grab "referent->len" in "len" (or "orig_len") immediately
   before you first modify referent, i.e. before strbuf_rtrim() call.

	orig_len = referent->len;
	orig_last_byte = referent->buf[orig_len - 1];

> +	strbuf_rtrim(referent);
> +	if (check_refname_format(referent->buf, 0)) {
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_BAD_REFERENT_NAME,
> +				      "points to refname with invalid format");

Similar to an earlier step, the message does not give any more
information than the enum.  Wouldn't the user who got this error
want to learn what referent->buf said and which part of it was bad
in the same message, instead of having to look it up on their own
after fsck finishes?

> +		goto out;
> +	}

At this point we know check_refname_format() is happy with what is
left after rtrimming the referent.  There are four cases:

 - rtrim() did not trim anything (orig_len == referent->len); the file
   lacked the terminating LF.

 - rtrim() trimmed one byte (orig_len - 1 == referent->len) and
   the byte was not LF (orig_last_byte != '\n').  The file lacked
   the terminating LF.

 - rtrim() trimmed exactly one byte (orig_len - 1 == referent->len)
   and the byte was LF (orig_last_byte == '\n').  There is no error.

 - all other cases, i.e., rtrim() trimmed two or more bytes.  The
   file had trailing whitespaces after a valid referent that passed
   check_refname_format().

So in short,

	if (referent->len == orig_len ||
	    referent->len == orig_len - 1 && orig_last_byte != '\n') {
		FSCK_MSG_REF_MISSING_NEWLINE;
	} else if (referent->len < orig_len - 1) {
		FSCK_MSG_REF_TRAILING_WHITESPACE;
	}

can replace the next block you wrote, and we can also remove the
earlier "it is an error if it does not end with '\n'", I think.

> +	if (len != referent->len) {
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_TRAILING_REF_CONTENT,
> +				      "trailing garbage in ref");

As check_refname_format() was happy, the difference between orig_len
and referent->len are only coming from trailing whitespaces, i.e. it
is not that it had arbitrary garbage.  Shouldn't we be more explicit
about that?

> +	/*
> +	 * Dangling symrefs are common and so we don't report them.
> +	 */
> +	if (lstat(referent_path->buf, &st)) {
> +		if (errno != ENOENT) {
> +			ret = error_errno(_("unable to stat '%s'"),
> +					  referent_path->buf);
> +		}
> +		goto out;
> +	}
> +
> +	/*
> +	 * We cannot distinguish whether "refs/heads/a" is a directory or not by
> +	 * using "check_refname_format(referent->buf, 0)". Instead, we need to
> +	 * check the file type of the target.
> +	 */
> +	if (S_ISDIR(st.st_mode)) {
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_BAD_REFERENT_FILETYPE,
> +				      "points to the directory");
> +		goto out;
> +	}

If referent_path->buf refers to "refs/heads/historical/", and all
the branches under the hierarchy have been sent to packed-refs,
then this check will not trigger.

I wonder if this check is the right thing to enforce in the first
place, though.

As far as the end user is concerned, refs/heads/historical/master
branch stil exists, and there is no refs/heads/historical branch, so
such a symbolic ref, for all intents and purposes, is the same as
any other dangling symbolic refs, no?

Of course, "git update-ref SUCH_A_SYMREF HEAD" will complain because
there is refs/heads/historical, with something like 

    "refs/heads/historical/master" exists, cannot create "refs/heads/historical"

but that is to be expected.  If you remove the last branch in the
refs/heads/historical hierarchy, you should be able to do such an
update-ref to instanciate refs/heads/historical as a regular ref.

> @@ -3484,12 +3553,24 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
>  					      "trailing garbage in ref");
>  			goto cleanup;
>  		}
> +	} else {
> +		strbuf_addf(&referent_path, "%s/%s",
> +			    ref_store->gitdir, referent.buf);
> +		/*
> +		 * the referent may contain the spaces and the newline, need to
> +		 * trim for path.
> +		 */
> +		strbuf_rtrim(&referent_path);

I doubt this is a good design.  We have referent, and the symbolic
ref checker knows that the true referent refname may be followed by
whitespaces, so instead of inventing referent _path here, it would
be a better design to let the files_fsck_symref_target() to decide
what file to open and check based on referent, no?  Give it the
refstore or refstore's gitdir and have the concatenation with the
rtrimmed contents in the referent->buf after it inspected it
instead, perhaps?

> +		ret = files_fsck_symref_target(o, &report,
> +					       &referent,
> +					       &referent_path);




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux