On Thu, Jan 30, 2025 at 10:58:32AM -0800, Junio C Hamano wrote: > shejialuo <shejialuo@xxxxxxxxx> writes: > > > In "packed-backend.c::create_snapshot", if there is a header (the line > > which starts with '#'), we will check whether the line starts with "# > > pack-refs with:". As we are going to implement the header consistency > > check, we should port this check into "packed_fsck". > > > > However, the above check is not enough, this is because "git pack-refs" > > will always write "PACKED_REFS_HEADER" which is a constant string to the > > "packed-refs" file. So, we should check the following things for the > > header. > > I haven't done history digging in this area for a while, but we > should make sure we are not flagging a file that was written in > ancient version of Git whose repository is still supported. > Understood. > > 1. If the header does not exist, we may report an error to the user > > because it should exist, but we do allow no header in "packed-refs" > > file. So, create a new fsck message "packedRefMissingHeader(INFO)" to > > warn the user and also keep compatibility. > > Are we sure "it should exist"? I think the header did not exist > before "Git v1.5.0". I didn't check with other reimplementations of > Git (like jgit or libgit2), but as long as our reading side of the > runtime allows a packed-refs file without the header without > complaint, I do not think it is a good idea to treat it as a > report-worthy event from "git fsck". > OK, let me improve this in the next version. > > 2. If the header content does not start with "# packed-ref with:", we > > should report an error just like what "create_snapshot" does. So, > > create a new fsck message "badPackedRefHeader(ERROR)" for this. > > This I can agree with. If the first line begins with "#" but not > with that string (with a trailing SP), that is a sign that it may > not even be a valid packed-refs file, which is a report-worthy > event. > > > 3. If the header content is not the same as the constant string > > "PACKED_REFS_HEADER", ideally, we should report an error to the user. > > NO. THAT IS NOT IDEAL AT ALL. > > The header was written like this: > > /* perhaps other traits later as well */ > fprintf(cbdata.refs_file, "# pack-refs with: peeled \n"); > > in the older versions of Git before it was made into a separate > preprocessor macro and lost the comment (the above excerpt is from > "git show v1.5.0:builtin-pack-refs.c"). > > Notice "other traits later" in the comment? > > The thing is _designed_ to be extensible. In fact, these days we > support a few more traits > > static const char PACKED_REFS_HEADER[] = > "# pack-refs with: peeled fully-peeled sorted \n"; > > (an excerpt from the current refs/packed-backend.c). > > Reporting an error when you see something written by an older > version of Git is far from ideal. > Understood, I think we should be consistency with the runtime check. > > However, we allow other contents as long as the header content starts > > with "# packed-ref with:". To keep compatibility, create a new fsck > > message "unknownPackedRefHeader(INFO)" to warn about this. We may > > tighten this rule in the future. > > Whatever we do, what we do with an unknown trait should be in line > with what the runtime does. If the runtime failed (we do not, but > this is to illustrate the principle [*]) on a packed-refs file > without "sorted" trait, noticing that "sorted" is not there and > flagging as an error is a good thing to do. But if the runtime > gracefully degrades and sorts the list of refs read from such a > packed-refs file before continuing, then a packed-refs file that > lack "sorted" trait is not a report-worthy event. > Actually, the runtime won't complain about this. I agree with you here. > I do not offhand recall if we introduced the concept of mandatory vs > optional traits in the packed-refs part of the system (like we have > in the index extension subsystem, where a version of Git that > encounters an unknown *and* mandatory index extension must refuse to > touch the repository), but if there is a mandatory trait declared in > the header that our version of Git does not understand, it is a > report-worthy event that must be flagged with "git refs verify". > I don't think any trait in "packed-refs" is mandatory. Because I have done some experiments before implementing the code. We should only check case 2 here. > > +static int packed_fsck_ref_header(struct fsck_options *o, const char *start, const char *eol) > > +{ > > + const char *err_fmt = NULL; > > + int fsck_msg_id = -1; > > + > > + if (!starts_with(start, "# pack-refs with:")) { > > + err_fmt = "'%.*s' does not start with '# pack-refs with:'"; > > + fsck_msg_id = FSCK_MSG_BAD_PACKED_REF_HEADER; > > + } else if (strncmp(start, PACKED_REFS_HEADER, strlen(PACKED_REFS_HEADER))) { > > + err_fmt = "'%.*s' is an unknown packed-refs header"; > > + fsck_msg_id = FSCK_MSG_UNKNOWN_PACKED_REF_HEADER; > > + } > > As I outlined above, this is totally unacceptable. > > Inspecting the header is good, but if this code claims to be a > checker, it should do at least what the runtime does, i.e. parse the > header to tell what traits the packed-file declares, not just > assuming that it is a fixed string. And error on unknown trait(s) > if they are mandatory (if such a concept is implemented in the > runtime reading side). Informing on an unknown and optional > trait(s) I can live with, but personally I wouldn't recommend it. > Got it, I don't want to report unknown trait(s) either. > In other words, report loudly if it is an error, but otherwise stay > silent if we know we tolerate it well. > Thanks for this suggestion. > > +static int packed_fsck_ref_content(struct fsck_options *o, > > + const char *start, const char *eof) > > +{ > > + struct strbuf packed_entry = STRBUF_INIT; > > + int line_number = 1; > > We limit ourselves with about 1 billion refs in the packed-refs > file, which may be plenty, Let me change this to `size_t`. This would be better. > but I do not quite understand the use of > this variable. There is no loop inside this so ... > The reason why I define this variable is that I am going to use loop to check each entry in the next patch. > > + const char *eol; > > + int ret = 0; > > + > > + strbuf_addf(&packed_entry, "packed-refs line %d", line_number); > > ... this is always line #1, and then > > > + ret |= packed_fsck_ref_next_line(o, &packed_entry, start, eof, &eol); > > + if (*start == '#') { > > + ret |= packed_fsck_ref_header(o, start, eol); > > + > > + start = eol + 1; > > + line_number++; > > ... it may be incremented, but upon returning from the funcition, it > is lost. > > Perhaps you wanted to make it a function-scope static, but then you > are allowed to read one single packed-refs file during the life of > your process before you exit, which I am not sure is what you want? > Actually, what I want is use this variable for looping the each ref entry in the "packed-refs" file. > > + } else { > > + struct fsck_ref_report report = { 0 }; > > + report.path = "packed-refs"; > > + > > + ret |= fsck_report_ref(o, &report, > > + FSCK_MSG_PACKED_REF_MISSING_HEADER, > > + "missing header line"); > > + } > > + > > + strbuf_release(&packed_entry); > > + return ret; > > +} Thanks, Jialuo