Re: [PATCH 2/3] fsck_walk(): optionally name objects on the go

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Sun, 17 Jul 2016 10:44:27 +0200 (CEST)

Hi Junio,

On Thu, 14 Jul 2016, Junio C Hamano wrote:

> Johannes Schindelin <johannes.schindelin@xxxxxx> writes:
> 
> > Note that this patch opts for decorating the objects with plain strings
> > instead of full-blown structs (à la `struct rev_name` in the code of
> > the `git name-rev` command), for several reasons:
> >
> > - the code is much simpler than if it had to work with structs that
> >   describe arbitrarily long names such as "master~14^2~5:builtin/am.c",
> >
> > - the string processing is actually quite light-weight compared to the
> >   rest of fsck's operation,
> >
> > - the caller of fsck_walk() is expected to provide names for the
> >   starting points, and using plain and simple strings is just the
> >   easiest way to do that.
> 
> Simpler is good; we can always optimize something well-isolated like
> this later if it proves necessary.

I am glad we agree!

> > +static char *get_object_name(struct fsck_options *options, struct object *obj)
> > +{
> > +	return options->object_names ?
> > +		lookup_decoration(options->object_names, obj) : NULL;
> > +}
> > +
> > +static void put_object_name(struct fsck_options *options, struct object *obj,
> > +	const char *fmt, ...)
> > +{
> > +	va_list ap;
> > +	char *existing = lookup_decoration(options->object_names, obj);
> > +	struct strbuf buf = STRBUF_INIT;
> 
> When reading a few early calling sites, it wasn't quite obvious how
> the code avoids the "naming" when .object_names decoration is not
> initialized (which is tied to the --name-objects option to decide if
> the feature needs to be triggered).  The current "if get_object_name
> for the containing object gives us NULL, then we refrain from
> calling put_object_name()" may be good enough, but having an early
> return similar to get_object_name() would make it easier to grok,

My knee-jerk reaction was: in order to name objects in this part of the
code, we need the name of the parent/tree/whatever, so yeah, we have
object_names.

But you're right, it is much easier to read with the early returns.

And who knows, maybe the fsck.c code will learn to name the starting
points itself in the future?

> >  	while (tree_entry(&desc, &entry)) {
> >  		int result;
> >  
> > +		if (name) {
> > +			struct object *obj = parse_object(entry.oid->hash);
> 
> This worries me somewhat.  IIRC, "git fsck" uses object->parsed to
> tell between objects that are unreachable or not and act differently
> so I would fear that parsing the object here would screw up that
> logic, when the call comes from fsck_dir() -> fsck_sha1_list() ->
> fsck_sha1() -> fsck_obj() -> fsck_walk() -> fsck_walk_tree()
> codepath.  Is it no longer the case, I wonder?
> 
> I see in the same loop there is a call to lookup_tree()->object, which
> probably is how the existing code avoids that issue?

Most likely. I factored that out so that the object is looked up first,
and reused via object_as_type() for the tree/blob cases.

Both concerns will be addressed in the next iteration.

Thanks,
Dscho