On Sun, Jan 05, 2025 at 09:49:37PM +0800, shejialuo wrote: > Add a new flag "safe_object_check" in "fsck_options", when there is > anything wrong with the parsing process, set this flag to 0 to avoid > checking objects in the later checks. Okay, I understand the motivation: a corrupted refdb may be completely bogus, so checking its objects may not be sensible. For one of the preceding commits I made the suggestion to split out the object checks into a generic part instead, as they aren't specific to the backend. With such a scheme we could adapt the logic to first do the backend-specific checks for the format, and only in case the backend looks sane to us we'd execute those generic checks for that specific backend. That'd allow us to get rid of the "safe object check" flag. > diff --git a/refs/packed-backend.c b/refs/packed-backend.c > index d9eb2f8b71..3b11abe5f8 100644 > --- a/refs/packed-backend.c > +++ b/refs/packed-backend.c > @@ -1748,12 +1748,100 @@ static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_s > return empty_ref_iterator_begin(); > } > > +static int packed_fsck_ref_next_line(struct fsck_options *o, > + int line_number, const char *start, > + const char *eof, const char **eol) > +{ > + int ret = 0; > + > + *eol = memchr(start, '\n', eof - start); > + if (!*eol) { > + struct strbuf packed_entry = STRBUF_INIT; > + struct fsck_ref_report report = { 0 }; > + > + strbuf_addf(&packed_entry, "packed-refs line %d", line_number); > + report.path = packed_entry.buf; > + ret = fsck_report_ref(o, &report, > + FSCK_MSG_PACKED_REF_ENTRY_NOT_TERMINATED, > + "'%.*s' is not terminated with a newline", > + (int)(eof - start), start); > + > + /* > + * There is no newline but we still want to parse it to the end of > + * the buffer. > + */ > + *eol = eof; I don't quite understand. We've figured out that there isn't a newline, so wouldn't that mean that we _are_ at the end of the buffer already? > + strbuf_release(&packed_entry); > + } > + > + return ret; > +} > + > +static int packed_fsck_ref_header(struct fsck_options *o, const char *start, const char *eol) > +{ > + const char *err_fmt = NULL; > + int fsck_msg_id = -1; > + > + if (!starts_with(start, "# pack-refs with:")) { > + err_fmt = "'%.*s' does not start with '# pack-refs with:'"; > + fsck_msg_id = FSCK_MSG_BAD_PACKED_REF_HEADER; > + } else if (strncmp(start, PACKED_REFS_HEADER, strlen(PACKED_REFS_HEADER))) { > + err_fmt = "'%.*s' is not the official packed-refs header"; I wouldn't say "official", because it could totally be that whatever is official changes in the future, e.g. when a new format is introduced. Unlikely to happen, but saying "unknown packed-refs header" might be a bit more future proof. > + fsck_msg_id = FSCK_MSG_UNKNOWN_PACKED_REF_HEADER; > + } > + > + if (err_fmt && fsck_msg_id >= 0) { > + struct fsck_ref_report report = { 0 }; > + report.path = "packed-refs.header"; > + > + return fsck_report_ref(o, &report, fsck_msg_id, err_fmt, > + (int)(eol - start), start); > + > + } > + > + return 0; > +} > + > +static int packed_fsck_ref_content(struct fsck_options *o, > + const char *start, const char *eof) > +{ > + int line_number = 1; > + const char *eol; > + int ret = 0; > + > + ret |= packed_fsck_ref_next_line(o, line_number, start, eof, &eol); > + if (*start == '#') { > + ret |= packed_fsck_ref_header(o, start, eol); > + > + start = eol + 1; > + line_number++; The header can only appear at the beginning of the file, can't it? But we accept it in every line here. We should likely verify that it's actually a header and not a line at some random place. > + } else { > + struct fsck_ref_report report = { 0 }; > + report.path = "packed-refs"; > + > + ret |= fsck_report_ref(o, &report, > + FSCK_MSG_PACKED_REF_MISSING_HEADER, > + "missing header line"); > + } > + > + /* > + * If there is anything wrong during the parsing of the "packed-refs" > + * file, we should not check the object of the refs. > + */ > + if (ret) > + o->safe_object_check = 0; > + > + > + return ret; > +} > + > static int packed_fsck(struct ref_store *ref_store, > struct fsck_options *o, > struct worktree *wt) > { > struct packed_ref_store *refs = packed_downcast(ref_store, > REF_STORE_READ, "fsck"); > + struct strbuf packed_ref_content = STRBUF_INIT; > struct stat st; > int ret = 0; > > @@ -1779,7 +1867,24 @@ static int packed_fsck(struct ref_store *ref_store, > goto cleanup; > } > > + if (strbuf_read_file(&packed_ref_content, refs->path, 0) < 0) { > + /* > + * Although we have checked that the file exists, there is a possibility > + * that it has been removed between the lstat() and the read attempt by > + * another process. In that case, we should not report an error. > + */ > + if (errno == ENOENT) > + goto cleanup; Unlikely, but good to guard us against that condition regardless. It's still not entirely race-free though because the file could meanwhile have changed into a symlink, and we wouldn't notice now. We could fix that by using open(O_NOFOLLOW), fstat the returne file descriptor and then use `strbuf_read()` to slurp in the file. > + ret = error_errno("could not read %s", refs->path); > + goto cleanup; > + } > + > + ret = packed_fsck_ref_content(o, packed_ref_content.buf, > + packed_ref_content.buf + packed_ref_content.len); > + > cleanup: > + strbuf_release(&packed_ref_content); > return ret; > } > > diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh > index 307f94a3ca..6c729e749a 100755 > --- a/t/t0602-reffiles-fsck.sh > +++ b/t/t0602-reffiles-fsck.sh > @@ -646,4 +646,48 @@ test_expect_success SYMLINKS 'the filetype of packed-refs should be checked' ' > test_cmp expect err > ' > > +test_expect_success 'packed-refs header should be checked' ' > + test_when_finished "rm -rf repo" && > + git init repo && > + cd repo && The same comment applies here as on a preceding test: cd should be executed in a subshell. Patrick