Re: Set up for better tree diff optimizations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

> This is mainly just a cleanup patch, and sets up for later changes where 
> the tree-diff.c "interesting()" function can return more than just a 
> yes/no value.
>
> In particular, it should be quite possible to say "no subsequent entries 
> in this tree can possibly be interesting any more", and thus allow the 
> callers to short-circuit the tree entirely.
>
> In fact, changing the callers to do so is trivial, and is really all this 
> patch really does, because changing "interesting()" itself to say that 
> nothing further is going to be interesting is definitely more complicated, 
> considering that we may have arbitrary pathspecs.
> ...
> Junio, your decision.

Sorry, as I was sick and spent almost all day in bed, I am
somewhat behind going through the mailing list backlog.

>  tree-diff.c |   44 ++++++++++++++++++++++++++++++++++----------
>  1 files changed, 34 insertions(+), 10 deletions(-)
>
> diff --git a/tree-diff.c b/tree-diff.c
> index f89b9d3..2e0a3ae 100644
> --- a/tree-diff.c
> +++ b/tree-diff.c
> @@ -66,7 +66,15 @@ static int compare_tree_entry(struct tree_desc *t1, struct tree_desc *t2, const
>  	return 0;
>  }
>  
> -static int interesting(struct tree_desc *desc, const char *base, int baselen, struct diff_options *opt)
> +/*
> + * Is a tree entry interesting given the pathspec we have?
> + *
> + * Return:
> + *  - positive for yes
> + *  - zero for no
> + *  - negative for "no, and no subsequent entries will be either"
> + */
> +static int tree_entry_interesting(struct tree_desc *desc, const char *base, int baselen, struct diff_options *opt)
>  {
>  	const char *path;
>  	const unsigned char *sha1;

I've already applied the patch, but I do not know how much this
interface would help, as the only easy case the function
tree_entry_interesting() can say "no subsequent entries will be
either" without looking ahead is where no pathspecs match the
base, but that is already prevented by the way you walk the tree
(you do not descend into an uninteresting tree).

If you do:

	$ git log arch/i386/ include/asm-i386/

and if the traversal is already in arch/i386/crypto, then it is
unnecessary to check if a path discovered during the traversal
is interesting, as the base ("arch/i386/crypto") for all the
entries in that subdirectory is already covered by one of the
pathspecs ("arch/i386").  It also means checking the other
pathspec "include/asm-i386/" is a waste of time, because that
will not match the base no matter what path is discovered from
the tree object anyway.

I suspect that two possible optimizations are:

 (1) instead of, or in addition to, allowing it to say "no
     subsequent ones will match", allow it to say "everything
     for this base will be interesting, stop bothering to ask me
     one path at a time".

 (2) mark include/asm-i386/ pathspec to be N/A while we are
     already in arch/i386/crypto, as we already know it will
     never match.  And remove that mark when we come back from
     that recursion.


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]