Re: git status: small difference between stating whole repository and small subdirectory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 20, 2012 at 12:08:03PM -0800, Junio C Hamano wrote:

> >   4. At the end of unpack_trees, we forget about src_index, and copy
> >      o->result into *o->dst_index byte for byte. I.e., we overwrite
> >      the_index.cache_tree, which has been properly updated the whole
> >      time,
> 
> I strongly suspect that "properly updated" part needs to be thoroughly
> audited.  I wouldn't be surprised that this behaviour is what we did when
> we split src_index vs dst_index when he rewrote unpack_trees() in order to
> emulate the original "unpack-trees is beyond salvation because it does not
> maintain cache tree correctly, just nuke it" behaviour.

Yep, I am also concerned about that.

> > But it does not actually insert the _destination_ tree into the cache
> > tree. Which we can do in certain situations, but only if there were no
> > paths in the tree that were left unchanged (e.g., you modify "foo", then
> > "git checkout HEAD^", which updates "bar". Your tree does not match
> > HEAD^, and must be invalidated).  While it would be cool to be able to
> > handle those complex cases,...
> 
> It may look cool but it may not be a good change. You are spending extra
> cycles to optimize for the next write-tree that may not happen before the
> index is further updated.

I don't think it would be too many cycles; you would have to mark each
tree you enter as having items from the left-hand tree or the right-hand
tree. If only one, you can reuse the cache-tree entry (or tree sha1, if
coming from a tree). Otherwise, you must invalidate.  And it doesn't
just help the next write-tree, but any intermediate index diffs.

Of course any such change would need timings to justify it, though.

That being said, I think just invalidating really covers 99% of the
cases. What we really care about is that when I modify kernel/foo.c, the
~2300 other directories (besides "" and "kernel") don't need rebuilt,
and that is relatively simple to do. Even if doing it the other way
produced a tiny speedup, I would be concerned with the increase in code
complexity.

> > I think this implementation matches the intent of the original calls to
> > cache_tree_invalidate_path sprinkled throughout unpack-trees.c.
> 
> Yes, and as long as we invalidate all the directories that need to be
> invalidated during the unpack-tree operation, I think it is a correct
> thing to do.

OK. I'll do some reading of the code to convince myself that the
unpack_trees callbacks are doing the right thing. I'm not sure of a good
automatic test that would detect a failure there. Just making test cases
is going to end up too contrived, unless we are missing something really
obvious.

I'm thinking maybe something like replaying the commit history of
linux-2.6 and making sure that each the tree generated by the cache-tree
in each case matches the actual committed tree.

> > But I
> > have to say that it seems a little odd for us to be modifying the
> > o->src_index throughout the whole thing.
> 
> Yes, that part is logically *wrong*.  I think it is a remnant from the
> days when there was no distinction between src_index and dst_index.

OK. I'll include a fix for that in the series I prepare.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]