Re: [PATCH v2 4/8] packed-backend: add "packed-refs" header consistency check

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 30, 2025 at 10:58:32AM -0800, Junio C Hamano wrote:
> shejialuo <shejialuo@xxxxxxxxx> writes:
> 
> > In "packed-backend.c::create_snapshot", if there is a header (the line
> > which starts with '#'), we will check whether the line starts with "#
> > pack-refs with:". As we are going to implement the header consistency
> > check, we should port this check into "packed_fsck".
> >
> > However, the above check is not enough, this is because "git pack-refs"
> > will always write "PACKED_REFS_HEADER" which is a constant string to the
> > "packed-refs" file. So, we should check the following things for the
> > header.
> 
> I haven't done history digging in this area for a while, but we
> should make sure we are not flagging a file that was written in
> ancient version of Git whose repository is still supported.
> 

Understood.

> > 1. If the header does not exist, we may report an error to the user
> >    because it should exist, but we do allow no header in "packed-refs"
> >    file. So, create a new fsck message "packedRefMissingHeader(INFO)" to
> >    warn the user and also keep compatibility.
> 
> Are we sure "it should exist"?  I think the header did not exist
> before "Git v1.5.0".  I didn't check with other reimplementations of
> Git (like jgit or libgit2), but as long as our reading side of the
> runtime allows a packed-refs file without the header without
> complaint, I do not think it is a good idea to treat it as a
> report-worthy event from "git fsck".
> 

OK, let me improve this in the next version.

> > 2. If the header content does not start with "# packed-ref with:", we
> >    should report an error just like what "create_snapshot" does. So,
> >    create a new fsck message "badPackedRefHeader(ERROR)" for this.
> 
> This I can agree with.  If the first line begins with "#" but not
> with that string (with a trailing SP), that is a sign that it may
> not even be a valid packed-refs file, which is a report-worthy
> event.
> 
> > 3. If the header content is not the same as the constant string
> >    "PACKED_REFS_HEADER", ideally, we should report an error to the user.
> 
> NO.  THAT IS NOT IDEAL AT ALL.
> 
> The header was written like this:
> 
>         /* perhaps other traits later as well */
>         fprintf(cbdata.refs_file, "# pack-refs with: peeled \n");
> 
> in the older versions of Git before it was made into a separate
> preprocessor macro and lost the comment (the above excerpt is from
> "git show v1.5.0:builtin-pack-refs.c").
> 
> Notice "other traits later" in the comment?
> 
> The thing is _designed_ to be extensible.  In fact, these days we
> support a few more traits
> 
>         static const char PACKED_REFS_HEADER[] =
>                 "# pack-refs with: peeled fully-peeled sorted \n";
> 
> (an excerpt from the current refs/packed-backend.c).
> 
> Reporting an error when you see something written by an older
> version of Git is far from ideal.
> 

Understood, I think we should be consistency with the runtime check.

> >    However, we allow other contents as long as the header content starts
> >    with "# packed-ref with:". To keep compatibility, create a new fsck
> >    message "unknownPackedRefHeader(INFO)" to warn about this. We may
> >    tighten this rule in the future.
> 
> Whatever we do, what we do with an unknown trait should be in line
> with what the runtime does.  If the runtime failed (we do not, but
> this is to illustrate the principle [*]) on a packed-refs file
> without "sorted" trait, noticing that "sorted" is not there and
> flagging as an error is a good thing to do.  But if the runtime
> gracefully degrades and sorts the list of refs read from such a
> packed-refs file before continuing, then a packed-refs file that
> lack "sorted" trait is not a report-worthy event.
> 

Actually, the runtime won't complain about this. I agree with you here.

> I do not offhand recall if we introduced the concept of mandatory vs
> optional traits in the packed-refs part of the system (like we have
> in the index extension subsystem, where a version of Git that
> encounters an unknown *and* mandatory index extension must refuse to
> touch the repository), but if there is a mandatory trait declared in
> the header that our version of Git does not understand, it is a
> report-worthy event that must be flagged with "git refs verify".
> 

I don't think any trait in "packed-refs" is mandatory. Because I have
done some experiments before implementing the code. We should only check
case 2 here.

> > +static int packed_fsck_ref_header(struct fsck_options *o, const char *start, const char *eol)
> > +{
> > +	const char *err_fmt = NULL;
> > +	int fsck_msg_id = -1;
> > +
> > +	if (!starts_with(start, "# pack-refs with:")) {
> > +		err_fmt = "'%.*s' does not start with '# pack-refs with:'";
> > +		fsck_msg_id = FSCK_MSG_BAD_PACKED_REF_HEADER;
> > +	} else if (strncmp(start, PACKED_REFS_HEADER, strlen(PACKED_REFS_HEADER))) {
> > +		err_fmt = "'%.*s' is an unknown packed-refs header";
> > +		fsck_msg_id = FSCK_MSG_UNKNOWN_PACKED_REF_HEADER;
> > +	}
> 
> As I outlined above, this is totally unacceptable.  
> 
> Inspecting the header is good, but if this code claims to be a
> checker, it should do at least what the runtime does, i.e. parse the
> header to tell what traits the packed-file declares, not just
> assuming that it is a fixed string.  And error on unknown trait(s)
> if they are mandatory (if such a concept is implemented in the
> runtime reading side).  Informing on an unknown and optional
> trait(s) I can live with, but personally I wouldn't recommend it.
> 

Got it, I don't want to report unknown trait(s) either.

> In other words, report loudly if it is an error, but otherwise stay
> silent if we know we tolerate it well. 
> 

Thanks for this suggestion.

> > +static int packed_fsck_ref_content(struct fsck_options *o,
> > +				   const char *start, const char *eof)
> > +{
> > +	struct strbuf packed_entry = STRBUF_INIT;
> > +	int line_number = 1;
> 
> We limit ourselves with about 1 billion refs in the packed-refs
> file, which may be plenty,

Let me change this to `size_t`. This would be better.

> but I do not quite understand the use of
> this variable.  There is no loop inside this so ...
> 

The reason why I define this variable is that I am going to use loop to
check each entry in the next patch.

> > +	const char *eol;
> > +	int ret = 0;
> > +
> > +	strbuf_addf(&packed_entry, "packed-refs line %d", line_number);
> 
> ... this is always line #1, and then
> 
> > +	ret |= packed_fsck_ref_next_line(o, &packed_entry, start, eof, &eol);
> > +	if (*start == '#') {
> > +		ret |= packed_fsck_ref_header(o, start, eol);
> > +
> > +		start = eol + 1;
> > +		line_number++;
> 
> ... it may be incremented, but upon returning from the funcition, it
> is lost.
> 
> Perhaps you wanted to make it a function-scope static, but then you
> are allowed to read one single packed-refs file during the life of
> your process before you exit, which I am not sure is what you want?
> 

Actually, what I want is use this variable for looping the each ref
entry in the "packed-refs" file.

> > +	} else {
> > +		struct fsck_ref_report report = { 0 };
> > +		report.path = "packed-refs";
> > +
> > +		ret |= fsck_report_ref(o, &report,
> > +				       FSCK_MSG_PACKED_REF_MISSING_HEADER,
> > +				       "missing header line");
> > +	}
> > +
> > +	strbuf_release(&packed_entry);
> > +	return ret;
> > +}

Thanks,
Jialuo




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux