On Thu, Jun 20, 2024 at 10:24:37AM -0700, Junio C Hamano wrote: > shejialuo <shejialuo@xxxxxxxxx> writes: [snip] > This seems to be doing too many things at once, making the result a > lot harder to review than necessary. At this point, nobody checks > refs and reports problems with refs, so fsck_refs_report() has no > callers and it is impossible to tell if the function signature of > it, iow, the set of parameters it receives, is sufficient, for > example. > I will split this commit into multiple commits in the next version. > Stepping back a bit, it is true that (1) all existing checks are > about "objects", and (2) all checks we want to implement around > "objects" and "refs" can be split cleanly into these two categories? > I am sure that "objects" and "refs" checks cannot be split cleanly into two categories. Let me elaborate more about this. The git-fsck(1) has already implicitly checked refs. When executing git-fsck(1) with no arguments, it will call "builtin/fsck.c::get_default_heads()". It will traverse every raw ref and use "fsck_handle_ref" to check the refs. struct obj *obj; obj = parse_object(the_repository, oid); if (!obj) { ... errors_found |= ERROR_REACHABLE; return 0; } if (obj->type != OBJ_COMMIT && is_branch(refname)) { ... errors_found |= ERROR_REFS; } obj->flags |= USED; mark_object_reachable(obj); return 0; git-fsck(1) firstly scans the object directory to check every loose object. If everything is OK, it will set "obj->flags" like the following: obj->flags &= ~(REACHABLE | SEEN); obj->flags |= HAS_OBJ; When an object is marked with "UNREACHABLE" and "UNUSED", it will be dangling. When checking objects, git-fsck(1) still checks the refs. But from my understanding, the main motivation here is mark the "USED" flag because we want to tell whether an object is dangling and BTY we check refs. So, we cannot really classify into these two categories. But we can classify into these two categories from action view. When checking refs, we look at the refs options; when checking objects, we look at the objects options. Although we may implicitly check some other things, this should be OK. So, I think "fsck_options" is only about options. Because the nature of the connectivity between refs and commit objects, we cannot make the boundary so clear. > I am wondering if there are checks and reports that would benefit > from having access to both objects and refs (e.g. when checking a > ref, you may want to see both what the name of the ref is and what > object the ref points at), in which case, being forced to implement > such a check-and-report as "object" or "ref" that has access to only > different subset of information may turn out to be too limiting. > Should "check-and-report" does this? I am doubt. There are some cases we do look at both object and ref. But "report" should not do this. The flow git-fsck(1) does is like the following: if (some_thing_bad) { ret = report(..., message); } Here, "check" will provide dedicated message to report. So If we really both look at both object and ref, the caller should be responsible for that. Like the message "refs/heads/foo points at f665776185ad074b236c00751d666da7d1977dbe which is a tag". The caller should produce such message. It will call some APIs to get the information and pass it to the "report" function. > Yes, I am OK with having substructure in fsck_options, but I am > doubting if it is a good idea to have a separate fsck_refs_report() > that can only take "name" that is different from fsck.c::report(). > > For example, how would we ensure that refs/heads/foo is allowed to > point at a commit object and nothing else, and how would we report a > violation when we find that ref/heads/foo is pointing at a tag, > i.e., "refs/heads/foo points at > f665776185ad074b236c00751d666da7d1977dbe which is a tag". The > fsck_refs_report() function is not equipped to do that; neither is > .refs_options.error_func() that only takes "name". > Yes, I am not satisfied with current design either. Actually, there are many redundant code in new "fsck_refs_report" function. Let's look at "fsck.c::report()" closely to see what it does. static int report(struct fsck_options *options, const struct object_id *oid, enum object_type object_type, enum fsck_msg_id msg_id, const char *fmt, ...) { va_list ap; struct strbuf sb = STRBUF_INIT; enum fsck_msg_type msg_type = fsck_msg_type(msg_id, options); int result; if (msg_type == FSCK_IGNORE) return 0; if (object_on_skiplist(options, oid)) return 0; if (msg_type == FSCK_FATAL) msg_type = FSCK_ERROR; else if (msg_type == FSCK_INFO) msg_type = FSCK_WARN; prepare_msg_ids(); strbuf_addf(&sb, "%s: ", msg_id_info[msg_id].camelcased); va_start(ap, fmt); strbuf_vaddf(&sb, fmt, ap); result = options->objs_options.error_func(options, oid, object_type, msg_type, msg_id, sb.buf); strbuf_release(&sb); va_end(ap); return result; } It will get the current message "msg_type" which would be configured and it will convert the "FSCK_FATAL" to "FSCK_ERROR", and "FSCK_INFO" to "FSCK_WARN". From above code, we can easily know what would the error (warn) message be the following: 'object: fsck_describe_object(o, oid): FSCK_MESSAGE: message'. The "fsck_describe_object" aims to provide meaningful message about the oid. And "FSCK_MESSAGE" is "msg_id_info[msg_id]".The last field "message" is the caller-provided message. So the hardest part to reuse this function is that it receives the parameters related to object such as `oid` and `object_type`. Actually, the "fsck_refs_report" could generate such information "refs/heads/foo points at f665776185ad074b236c00751d666da7d1977dbe which is a tag" like the following (I use string literal for simplicity here): if (some_thing_bad) { ... ret = fsck_refs_report(o, "refs/heads/foo", FSCK_BAD_REF_CONTENT, "points at f665776185ad074b236c00751d666da7d1977dbe which is a tag") } It will generate the following output in the current design: error: refs/heads/foo: badRefContent: points at f665776185ad074b236c00751d666da7d1977dbe which is a tag So my idea is that we just reuse the "report" function here. The bad part for current "report" function is that it is highly coupled with the object check. However, my idea is just like the "fsck_refs_report". <name>: <FSCK_MESSAGE_STRING>: <message> Thus, we may unify the report interface both for refs and objects. And we don't define two differtn interface: 1. "fsck_refs_error" 2. "fsck_objs_error" Also, we just need to use "fsck_refs_report" and "fsck_objs_report" to wrap the "report" function here to avoid redundance. > > +int fsck_refs_report(struct fsck_options *o, > > + const char *name, > > + enum fsck_msg_id msg_id, > > + const char *fmt, ...) > > ... > > + va_start(ap, fmt); > > + strbuf_vaddf(&sb, fmt, ap); > > + ret = o->refs_options.error_func(o, name, msg_type, msg_id, sb.buf); > > + strbuf_release(&sb); > > + va_end(ap); > > Perhaps the code and data structure of the entire series may be > capable of supporting such a check-and-report, but the primary point > I am making is that among what [1/7] adds, we cannot sanely judge if > these "refs" related additions are sensible by looking at [1/7]. >