On Thu, Jan 16, 2025 at 02:57:37PM +0100, Patrick Steinhardt wrote: > On Sun, Jan 05, 2025 at 09:49:37PM +0800, shejialuo wrote: > > Add a new flag "safe_object_check" in "fsck_options", when there is > > anything wrong with the parsing process, set this flag to 0 to avoid > > checking objects in the later checks. > > Okay, I understand the motivation: a corrupted refdb may be completely > bogus, so checking its objects may not be sensible. > > For one of the preceding commits I made the suggestion to split out the > object checks into a generic part instead, as they aren't specific to > the backend. With such a scheme we could adapt the logic to first do the > backend-specific checks for the format, and only in case the backend > looks sane to us we'd execute those generic checks for that specific > backend. That'd allow us to get rid of the "safe object check" flag. > Yes, I agree with you here. And I won't touch this topic in the next version. Let me make this patch concentrate on the "packed-ref" format. > > diff --git a/refs/packed-backend.c b/refs/packed-backend.c > > index d9eb2f8b71..3b11abe5f8 100644 > > --- a/refs/packed-backend.c > > +++ b/refs/packed-backend.c > > @@ -1748,12 +1748,100 @@ static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_s > > return empty_ref_iterator_begin(); > > } > > > > +static int packed_fsck_ref_next_line(struct fsck_options *o, > > + int line_number, const char *start, > > + const char *eof, const char **eol) > > +{ > > + int ret = 0; > > + > > + *eol = memchr(start, '\n', eof - start); > > + if (!*eol) { > > + struct strbuf packed_entry = STRBUF_INIT; > > + struct fsck_ref_report report = { 0 }; > > + > > + strbuf_addf(&packed_entry, "packed-refs line %d", line_number); > > + report.path = packed_entry.buf; > > + ret = fsck_report_ref(o, &report, > > + FSCK_MSG_PACKED_REF_ENTRY_NOT_TERMINATED, > > + "'%.*s' is not terminated with a newline", > > + (int)(eof - start), start); > > + > > + /* > > + * There is no newline but we still want to parse it to the end of > > + * the buffer. > > + */ > > + *eol = eof; > > I don't quite understand. We've figured out that there isn't a newline, > so wouldn't that mean that we _are_ at the end of the buffer already? > In the "packed-refs" file, the last line should end with a newline. If not, this is a fatal error. The motivation why I do this is that for each line, we could pass the "line_start" and "eol" to check. But if there is no newline, the "eol" will be NULL. So, I change it to "eof" to make sure that we could follow the same logic when "eol" is not NULL. I guess I should not handle this in this function which may cause confusion here. I will improve this in the next version. > > + strbuf_release(&packed_entry); > > + } > > + > > + return ret; > > +} > > + > > +static int packed_fsck_ref_header(struct fsck_options *o, const char *start, const char *eol) > > +{ > > + const char *err_fmt = NULL; > > + int fsck_msg_id = -1; > > + > > + if (!starts_with(start, "# pack-refs with:")) { > > + err_fmt = "'%.*s' does not start with '# pack-refs with:'"; > > + fsck_msg_id = FSCK_MSG_BAD_PACKED_REF_HEADER; > > + } else if (strncmp(start, PACKED_REFS_HEADER, strlen(PACKED_REFS_HEADER))) { > > + err_fmt = "'%.*s' is not the official packed-refs header"; > > I wouldn't say "official", because it could totally be that whatever is > official changes in the future, e.g. when a new format is introduced. > Unlikely to happen, but saying "unknown packed-refs header" might be a > bit more future proof. > I will improve this in the next version. > > + fsck_msg_id = FSCK_MSG_UNKNOWN_PACKED_REF_HEADER; > > + } > > + > > + if (err_fmt && fsck_msg_id >= 0) { > > + struct fsck_ref_report report = { 0 }; > > + report.path = "packed-refs.header"; > > + > > + return fsck_report_ref(o, &report, fsck_msg_id, err_fmt, > > + (int)(eol - start), start); > > + > > + } > > + > > + return 0; > > +} > > + > > +static int packed_fsck_ref_content(struct fsck_options *o, > > + const char *start, const char *eof) > > +{ > > + int line_number = 1; > > + const char *eol; > > + int ret = 0; > > + > > + ret |= packed_fsck_ref_next_line(o, line_number, start, eof, &eol); > > + if (*start == '#') { > > + ret |= packed_fsck_ref_header(o, start, eol); > > + > > + start = eol + 1; > > + line_number++; > > The header can only appear at the beginning of the file, can't it? But > we accept it in every line here. We should likely verify that it's > actually a header and not a line at some random place. > Yes. But we don't accept it in every line. Because in here, we are getting the first line "start" and "eol" by using "packed_fsck_ref_next_line". Only it starts with "#", we will check the header consistency. > > + } else { > > + struct fsck_ref_report report = { 0 }; > > + report.path = "packed-refs"; > > + > > + ret |= fsck_report_ref(o, &report, > > + FSCK_MSG_PACKED_REF_MISSING_HEADER, > > + "missing header line"); > > + } > > + > > + /* > > + * If there is anything wrong during the parsing of the "packed-refs" > > + * file, we should not check the object of the refs. > > + */ > > + if (ret) > > + o->safe_object_check = 0; > > + > > + > > + return ret; > > +} > > + > > static int packed_fsck(struct ref_store *ref_store, > > struct fsck_options *o, > > struct worktree *wt) > > { > > struct packed_ref_store *refs = packed_downcast(ref_store, > > REF_STORE_READ, "fsck"); > > + struct strbuf packed_ref_content = STRBUF_INIT; > > struct stat st; > > int ret = 0; > > > > @@ -1779,7 +1867,24 @@ static int packed_fsck(struct ref_store *ref_store, > > goto cleanup; > > } > > > > + if (strbuf_read_file(&packed_ref_content, refs->path, 0) < 0) { > > + /* > > + * Although we have checked that the file exists, there is a possibility > > + * that it has been removed between the lstat() and the read attempt by > > + * another process. In that case, we should not report an error. > > + */ > > + if (errno == ENOENT) > > + goto cleanup; > > Unlikely, but good to guard us against that condition regardless. It's > still not entirely race-free though because the file could meanwhile > have changed into a symlink, and we wouldn't notice now. We could fix > that by using open(O_NOFOLLOW), fstat the returne file descriptor and > then use `strbuf_read()` to slurp in the file. > Would this be too complicated for us to avoid race condition and we will introduce a lot of code to handle above logic. Because there is a possibility that when finishing reading the file content to the memory, the file could be changed into a symlink and we cannot notice. So, I wanna say we can't avoid race condition totally. It would be good if we avoid race, but what I am concern about here is that we would make the logic too complicated. So, could we make it unchanged?