Re: [GSoC][PATCH v4 1/7] fsck: add refs check interfaces to interact with fsck error levels

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 20 Jun 2024 10:24:37 -0700

shejialuo <shejialuo@xxxxxxxxx> writes:

> The git-fsck(1) focuses on object database consistency check. It relies
> on the "fsck_options" to interact with fsck error levels. However,
> "fsck_options" aims at checking the object database which contains a lot
> of fields only related to object database.
>
> In order to add ref operations, create a new struct named
> "fsck_refs_options" and a new struct named "fsck_objs_options". Remove
> object-related fields from "fsck_options" to "fsck_objs_options". Change
> the "fsck_options" with three parts of members:
>
> 1. The "fsck_refs_options".
> 2. The "fsck_objs_options".
> 3. The common settings both for refs and objects. Because we leave
>    common settings in "fsck_options". The setup process could be fully
>    reused without any code changing.
>
> Also add related macros to align with the current code. Because we
> remove some fields from "fsck_options" to "fsck_objs_options". Change
> the influenced code to use the "fsck_options.objs_options" instead of
> using "fsck_options" itself.
>
> The static function "report" provided by "fsck.c" aims at reporting the
> problems related to object database which cannot be reused for refs.
> Provide "fsck_refs_report" function to integrate the fsck error levels
> into reference consistency check.
>
> Mentored-by: Patrick Steinhardt <ps@xxxxxx>
> Mentored-by: Karthik Nayak <karthik.188@xxxxxxxxx>
> Signed-off-by: shejialuo <shejialuo@xxxxxxxxx>
> ---

This seems to be doing too many things at once, making the result a
lot harder to review than necessary.  At this point, nobody checks
refs and reports problems with refs, so fsck_refs_report() has no
callers and it is impossible to tell if the function signature of
it, iow, the set of parameters it receives, is sufficient, for
example.

Stepping back a bit, it is true that (1) all existing checks are
about "objects", and (2) all checks we want to implement around
"objects" and "refs" can be split cleanly into these two categories?

I am wondering if there are checks and reports that would benefit
from having access to both objects and refs (e.g. when checking a
ref, you may want to see both what the name of the ref is and what
object the ref points at), in which case, being forced to implement
such a check-and-report as "object" or "ref" that has access to only
different subset of information may turn out to be too limiting.

Yes, I am OK with having substructure in fsck_options, but I am
doubting if it is a good idea to have a separate fsck_refs_report()
that can only take "name" that is different from fsck.c::report().

For example, how would we ensure that refs/heads/foo is allowed to
point at a commit object and nothing else, and how would we report a
violation when we find that ref/heads/foo is pointing at a tag,
i.e., "refs/heads/foo points at
f665776185ad074b236c00751d666da7d1977dbe which is a tag".  The
fsck_refs_report() function is not equipped to do that; neither is
.refs_options.error_func() that only takes "name".

> +int fsck_refs_report(struct fsck_options *o,
> +		     const char *name,
> +		     enum fsck_msg_id msg_id,
> +		     const char *fmt, ...)
> ...
> +	va_start(ap, fmt);
> +	strbuf_vaddf(&sb, fmt, ap);
> +	ret = o->refs_options.error_func(o, name, msg_type, msg_id, sb.buf);
> +	strbuf_release(&sb);
> +	va_end(ap);

Perhaps the code and data structure of the entire series may be
capable of supporting such a check-and-report, but the primary point
I am making is that among what [1/7] adds, we cannot sanely judge if
these "refs" related additions are sensible by looking at [1/7].

Thanks.