Re: [PATCH v3 16/23] ref-filter: make cat_file_info independent

Jeff King <peff@xxxxxxxx> · Fri, 16 Feb 2018 16:54:43 -0500

On Thu, Feb 15, 2018 at 01:33:25PM +0300, Оля Тележная wrote:

> 2018-02-15 8:53 GMT+03:00 Jeff King <peff@xxxxxxxx>:
> > On Mon, Feb 12, 2018 at 08:08:54AM +0000, Olga Telezhnaya wrote:
> >
> >> Remove connection between expand_data variable
> >> in cat-file and in ref-filter.
> >> It will help further to get rid of using expand_data in cat-file.
> >
> > I have to admit I'm confused at this point about what is_cat_file is
> > for, or even why we need cat_file_data. Shouldn't these items be handled
> > by their matching ref-filter atoms at this point?
> 
> We discussed that earlier outside of mailing list, and I even tried to
> implement that idea and spent a couple of days to prove that it's not
> possible.
> The problem is that the list of atoms is made dynamically, and we
> can't store pointers to any values in each atom. That's why we need
> separate cat_file_info variable that is outside of main atom list.
> We also need is_cat_file because we still have some part of logic that
> is different for cat-file and for all other commands, and sometimes we
> need to know that information.

OK, right, I forgot about the pointers thing. This is definitely the
sort of design decision that should go into the commit message.

I went over our off-list discussion again to refresh my memory. Let me
try to summarize a bit (for myself, or for other readerss). Naively, one
would hope that you could do something like:

  1. While parsing the %(objectsize) atom, add a pointer like:

       the_object_info->sizep = &atom.size;

  2. Later when fulfilling the request for a specific object, if
     the_object_info is not empty, then call sha1_object_info() with it
     to hold the results.

  3. Read the value out of atom.size when assembling the formatted
     string.

But that has two problems:

  a. During parsing, we're resizing the atom array, so the &atom.size
     pointer is subject to change.  This is the issue you mentioned
     above. I think we could solve it by taking a pass over the atoms
     after parsing and filling in the pointers then. But...

  b. Another problem is that the same atom may appear multiple times. So
     parsing "%(objectsize) %(objectsize)", we can't point
     the_object_info at _both_ of them. We need our call to
     sha1_object_info() to store the result in a single location, and
     then as we format each atom they can both use that (which works
     even if they might format it differently, say if you had
     "%(deltabase) %(deltabase:abbrev)".

So we do need some kind of global storage to hold these results. This is
sort of the same problem we have with parsing the objects at all. There
we get by without a global because all of the logic is crammed into
populate_value(). In theory we could do the same thing here, but there
are just a lot of different items we might be asking for. So instead of
"need_tagged", you'd have a multitude of need_objectsize_disk,
need_deltabase, etc, which gets unwieldy. We may as well fill in a
"struct object_info".

So OK, I agree that something like "struct expand_data" is needed.

But what seems non-ideal to me is the way that ref-filter depends on it
in this cat-file-specific way. If it needs those values it should be
calling sha1_object_info(). And if it doesn't, then it can skip the
call. It shouldn't need to know about cat-file at all.

And likewise cat-file should not know about expand_data. And I think
that's true by the end of the series, though the steps in the middle
left me a little confused.

-Peff