On Wed, Aug 28, 2024 at 12:08:07AM +0800, shejialuo wrote: > We have already introduced the checks for regular refs. There is no need > to check the consistency of the target which the symbolic ref points to. > Instead, we just check the content of the symbolic ref itself. > > In order to check the content of the symbolic ref, create a function > "files_fsck_symref_target". It will first check whether the "pointee" is > under the "refs/" directory and then we will check the "pointee" itself. > > There is no specification about the content of the symbolic ref. > Although we do write "ref: %s\n" to create a symbolic ref by using > "git-symbolic-ref(1)" command. However, this is not mandatory. We still > accept symbolic refs with null trailing garbage. Put it more specific, > the following are correct: > > 1. "ref: refs/heads/master " > 2. "ref: refs/heads/master \n \n" > 3. "ref: refs/heads/master\n\n" Now that we're talking about tightening the rules for direct refs, I wonder whether we'd also want to apply the same rules to symrefs. Namely, when there is trailing whitespace we should generate an INFO-level message about that, too. This is mostly for the sake of consistency. [snip] > diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt > index fc074fc571..85fd058c81 100644 > --- a/Documentation/fsck-msgids.txt > +++ b/Documentation/fsck-msgids.txt > @@ -28,6 +28,9 @@ > `badRefName`:: > (ERROR) A ref has an invalid format. > > +`badSymrefPointee`:: > + (ERROR) The pointee of a symref is bad. I think we'd want to clarify what "bad" is supposed to mean. Like, is a missing symref pointee bad? If this is about the format of the pointee's name, we might want to call this "badSymrefPointeeName". Also, I think we don't typically call the value of a symbolic ref "pointee", but "target". Searching for "pointee" in our codebase only gives a single hit, and that one is not related to symbolic refs. > diff --git a/fsck.h b/fsck.h > index b85072df57..cbe837f84c 100644 > --- a/fsck.h > +++ b/fsck.h > @@ -34,6 +34,7 @@ enum fsck_msg_type { > FUNC(BAD_REF_CONTENT, ERROR) \ > FUNC(BAD_REF_FILETYPE, ERROR) \ > FUNC(BAD_REF_NAME, ERROR) \ > + FUNC(BAD_SYMREF_POINTEE, ERROR) \ > FUNC(BAD_TIMEZONE, ERROR) \ > FUNC(BAD_TREE, ERROR) \ > FUNC(BAD_TREE_SHA1, ERROR) \ > diff --git a/refs/files-backend.c b/refs/files-backend.c > index 69c00073eb..382c73fcf7 100644 > --- a/refs/files-backend.c > +++ b/refs/files-backend.c > @@ -3434,11 +3434,81 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store, > const char *refs_check_dir, > struct dir_iterator *iter); > > +/* > + * Check the symref "pointee_name" and "pointee_path". The caller should > + * make sure that "pointee_path" is absolute. For symbolic ref, "pointee_name" > + * would be the content after "refs:". > + */ > +static int files_fsck_symref_target(struct fsck_options *o, > + struct fsck_ref_report *report, > + const char *refname, > + struct strbuf *pointee_name, > + struct strbuf *pointee_path) > +{ > + const char *newline_pos = NULL; > + const char *p = NULL; > + struct stat st; > + int ret = 0; > + > + if (!skip_prefix(pointee_name->buf, "refs/", &p)) { > + > + ret = fsck_report_ref(o, report, > + FSCK_MSG_BAD_SYMREF_POINTEE, > + "points to ref outside the refs directory"); > + goto out; > + } > + > + newline_pos = strrchr(p, '\n'); > + if (!newline_pos || *(newline_pos + 1)) { > + ret = fsck_report_ref(o, report, > + FSCK_MSG_REF_MISSING_NEWLINE, > + "missing newline"); > + } The second condition `*(newline_pos + 1)` checks whether there is any data after the newline, doesn't it? That indicates a different kind of error than "missing newline", namely that there is trailing garbage. I guess we'd want to report a separate info-level message for this. Also, shouldn't we use `strchr` instead of `strrchr()`? Otherwise, we're only checking for trailing garbage after the _last_ newline, not after the first one. > + if (check_refname_format(pointee_name->buf, 0)) { > + /* > + * When containing null-garbage, "check_refname_format" will > + * fail, we should trim the "pointee" to check again. > + */ > + strbuf_rtrim(pointee_name); > + if (!check_refname_format(pointee_name->buf, 0)) { > + ret = fsck_report_ref(o, report, > + FSCK_MSG_TRAILING_REF_CONTENT, > + "trailing null-garbage"); > + goto out; > + } Ah, I didn't get at first that we're doing the check a second time here. As mentioned above, I think we should check for trailing garbage further up already and more explicitly. > diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh > index 7c1910d784..69280795ca 100755 > --- a/t/t0602-reffiles-fsck.sh > +++ b/t/t0602-reffiles-fsck.sh > @@ -176,4 +176,58 @@ test_expect_success 'regular ref content should be checked' ' > test_cmp expect err > ' > > +test_expect_success 'symbolic ref content should be checked' ' > + test_when_finished "rm -rf repo" && > + git init repo && > + branch_dir_prefix=.git/refs/heads && > + tag_dir_prefix=.git/refs/tags && > + cd repo && > + git commit --allow-empty -m initial && > + git checkout -b branch-1 && > + git tag tag-1 && > + git checkout -b a/b/branch-2 && > + > + printf "ref: refs/heads/branch" > $branch_dir_prefix/branch-1-no-newline && > + git refs verify 2>err && > + cat >expect <<-EOF && > + warning: refs/heads/branch-1-no-newline: refMissingNewline: missing newline > + EOF > + rm $branch_dir_prefix/branch-1-no-newline && > + test_cmp expect err && Same comments here as in the preceding patch for the tests. Patrick