Re: [PATCH 2/4] pack v4: add v4_size to struct delta_base_cache_entry

Nicolas Pitre <nico@xxxxxxxxxxx> · Fri, 13 Sep 2013 22:06:05 -0400 (EDT)

On Fri, 13 Sep 2013, Duy Nguyen wrote:

> On Fri, Sep 13, 2013 at 8:27 PM, Nicolas Pitre <nico@xxxxxxxxxxx> wrote:
> > On Thu, 12 Sep 2013, Nguyễn Thái Ngọc Duy wrote:
> >
> >> The intention is to store flat v4 trees in delta base cache to avoid
> >> repeatedly expanding copy sequences in v4 trees. When the user needs
> >> to unpack a v4 tree and the tree is found in the cache, the tree will
> >> be converted back to canonical format. Future tree_desc interface may
> >> skip canonical format and read v4 trees directly.
> >>
> >> For that to work we need to keep track of v4 tree size after all copy
> >> sequences are expanded, which is the purpose of this new field.
> >
> > Hmmm.... I think this is going in a wrong direction.
> 
> Good thing you caught me early. I was planning to implement a better
> version of this on the weekend. And you are not wrong about code
> maintainability, unpack_entry() so far looks very close to a real
> mess.

Yes.  We might have to break it into sub-functions.  However this has 
the potential to recurse rather deeply so it is necessary to limit its 
stack footprint as well.

> > Yet, pavkv4 tree walking shouldn't need a cache since there is nothing
> > to expand in the end.  Entries should be advanced one by one as they are
> > needed.  Granted when converting back to a canonical object we need all
> > of them, but eventually this shouldn't be the main mode of operation.
> 
> There's another case where one of the base tree is not v4 (the packer
> is inefficient, like my index-pack --fix-thin). For trees with leading
> zeros in entry mode, we can just do a lossy conversion to v4, but I
> wonder if there is a case where we can't even convert to v4 and the v4
> treewalk interface has to fall back to canonical format.. I guess that
> can't happen.

Well... There is little gain in converting loose tree objects into pv4 
format so there will always be a place for the canolical tree parser.

Eventually the base trees added by index-pack should be pv4 trees.  The 
only case where this might not work is for zero padded mode trees and we 
don't have to optimize for that i.e. a fallback to the canonical tree 
format will be good enough.

> > However I can see that, as you say, the same base object is repeatedly
> > referenced.  This means that we need to parse it over and over again
> > just to find the right offset where the needed entries start.  We
> > probably end up skipping over the first entries in a tree object
> > multiple times.  And that would be true even when the core code learns
> > to walk pv4 trees directly.
> >
> > So here's the beginning of a tree offset cache to mitigate this problem.
> > It is incomplete as the cache function is not implemented, etc.  But
> > that should give you the idea.
> 
> Thanks. I'll have a closer look and maybe complete your patch.

I've committed an almost final patch and pushed it out.  It is disabled 
right now due to some bug I've not found yet.  But you can have a look.

Nicolas