On Tue, Sep 03, 2024 at 08:20:46PM +0800, shejialuo wrote: > We implicitly rely on "git-fsck(1)" to check the consistency of regular > refs. However, when parsing the regular refs for files backend by using > "files-backend.c::parse_loose_ref_contents", we allow the ref content to > end with no newline or to contain some garbages. > > Even though we never create such loose refs ourselves, we have accepted > such loose refs. So, it is entirely possible that some third-party tools > may rely on such loose refs being valid. We should not report an error > fsck message at current. But let's notice such a "curiously formatted" > loose refs being valid and tell the user our findings, so we can access > the possible extent of damage when we tighten the parsing rules in the > future. > > And it's not suitable to either report a warn fsck message to the user. s/to either/either to > This is because if the caller set the "strict" field in "fsck_options", > fsck warns will be automatically upgraded to errors. We should not allow > user to specify the "--strict" flag to upgrade the fsck warnings to > errors at current. This is formulated a bit curiously: it reads as if we wanted to limit what the user can do, but what we really want to ensure is that the `--strict` flag doesn't convert it into an error. So maybe something like this instead of the second sentence: We don't (yet) want the "--strict" flag that controls this bit to end up generating errors for such weirdly-formatted reference contents, as we first want to assess whether this retroactive tightening will cause issues for any tools out there. > It might cause compatibility issue which may break s/issue/issues > the legacy repository. So we add the following two fsck infos to I wouldn't call it "legacy" just yet, as we didn't yet decide whether we're going to make this formatting invalid in the first place. It's rather a test balloon. > represent the situation where the ref content ends without newline or has > garbages: s/garbages/trailing garbage > 1. "refMissingNewline(INFO)": A ref does not end with newline. This kind > of ref may be considered ERROR in the future. > 2. "trailingRefContent(INFO)": A ref has trailing contents. This kind of > ref may be considered ERROR in the future. In both cases, "may be considered ERROR" -> "may be considered an error". Also in the actual messages. > It may seem that we could not give the user any warnings by creating > fsck infos. However, in "fsck.c::fsck_vreport", we will convert > "FSCK_INFO" to "FSCK_WARN" and we can still warn the user about these > situations when using "git-refs verify" without introducing s/"git-refs verify"/"git refs verify". We don't use dashed builtins nowadays anymore. > compatibility issue. s/issue/issues > In current "git-fsck(1)", it will report an error when the ref content > is bad, so we should following this to report an error to the user when > "parse_loose_ref_contents" fails. And we add a new fsck error message > called "badRefContent(ERROR)" to represent that a ref has a bad content. Okay, so this is basically porting over behaviour that git-fsck(1) already has to `git refs verify` and should thus not cause new issues anywhere. I think it would have made sense to do so in a first step and then introduce the tightened rules in a separate commit. Will we eventually remove those checks from git-fsck(1) when we adapt it to call `git refs verify`? If so, we should likely note that in the commit message. > In order to tell whether the ref has trailing content, add a new > parameter "trailing" to "parse_loose_ref_contents". Then introduce a new > function "files_fsck_refs_content" to check the regular refs to enhance > the "git-refs verify". This paragraph only re-explains what the diff already tells us, so it can likely be removed. > Mentored-by: Patrick Steinhardt <ps@xxxxxx> > Mentored-by: Karthik Nayak <karthik.188@xxxxxxxxx> > Signed-off-by: shejialuo <shejialuo@xxxxxxxxx> > --- > Documentation/fsck-msgids.txt | 11 ++++ > fsck.h | 3 + > refs.c | 2 +- > refs/files-backend.c | 68 ++++++++++++++++++- > refs/refs-internal.h | 2 +- > t/t0602-reffiles-fsck.sh | 120 ++++++++++++++++++++++++++++++++++ > 6 files changed, 202 insertions(+), 4 deletions(-) > > diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt > index 68a2801f15..06d045ac48 100644 > --- a/Documentation/fsck-msgids.txt > +++ b/Documentation/fsck-msgids.txt > @@ -19,6 +19,9 @@ > `badParentSha1`:: > (ERROR) A commit object has a bad parent sha1. > > +`badRefContent`:: > + (ERROR) A ref has a bad content. > + s/a bad content/bad content > `badRefFiletype`:: > (ERROR) A ref has a bad file type. > > @@ -170,6 +173,14 @@ > `nullSha1`:: > (WARN) Tree contains entries pointing to a null sha1. > > +`refMissingNewline`:: > + (INFO) A ref does not end with newline. This kind of ref may > + be considered ERROR in the future. > + I'd reformulate the second sentence to "This will be considered an error in the future". This indicates that we have the intent to tighten this check to any user and would urge them to speak up in case they disagree with such a tightening. > +`trailingRefContent`:: > + (INFO) A ref has trailing contents. This kind of ref may be > + considered ERROR in the future. Same. > @@ -3430,6 +3434,65 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store, > const char *refs_check_dir, > struct dir_iterator *iter); > > +static int files_fsck_refs_content(struct ref_store *ref_store, > + struct fsck_options *o, > + const char *refs_check_dir, > + struct dir_iterator *iter) > +{ > + struct strbuf ref_content = STRBUF_INIT; > + struct strbuf referent = STRBUF_INIT; > + struct strbuf refname = STRBUF_INIT; > + struct fsck_ref_report report = {0}; > + const char *trailing = NULL; > + unsigned int type = 0; > + int failure_errno = 0; > + struct object_id oid; > + int ret = 0; > + > + strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path); > + report.path = refname.buf; > + > + if (S_ISLNK(iter->st.st_mode)) > + goto cleanup; > + > + if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) { > + ret = error_errno(_("%s/%s: unable to read the ref"), > + refs_check_dir, iter->relative_path); We typically have the name of things we read trailing and not leading in error messages. So this should rather be "unable do read ref '%s/%s'". > + goto cleanup; > + } > + > + if (parse_loose_ref_contents(ref_store->repo->hash_algo, > + ref_content.buf, &oid, &referent, > + &type, &trailing, &failure_errno)) { > + ret = fsck_report_ref(o, &report, > + FSCK_MSG_BAD_REF_CONTENT, > + "invalid ref content"); > + goto cleanup; > + } > + > + if (!(type & REF_ISSYMREF)) { Coming back to my comment further up, I guess this whole block here could be introduced in a separate commit. So the first commit introduces the infra to check loose ref contents as an obvious step because we simply port over rules that already exist in git-fsck(1). And the second step could then do this retroactive tightening with the justification you have spelt out in the commit message. > + if (*trailing == '\0') { `if (!*trailing)` Patrick