Re: [PATCH 2/2] pickaxe: use textconv for -S counting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 19, 2012 at 04:48:22PM -0800, Junio C Hamano wrote:

> Junio C Hamano <gitster@xxxxxxxxx> writes:
> 
> >> Exact renames are the obvious one, but they are not handled here.
> >
> > That is half true.  Before this change, we will find the same number
> > of needles and this function would have said "no differences" in a
> > very inefficient way.  After this change, we may apply different
> > textconv filters and this function will say "there is a difference",
> > even though we wouldn't see such a difference at the content level
> > if there wasn't any rename.
> 
> ... but I think that is a good thing anyway.
> 
> If you renamed foo.c to foo.cc with different conversions from C
> code to the text that explain what the code does, if we special case
> only the exact rename case but let pickaxe examine the converted
> result in a case where blobs are modified only by one byte, we would
> get drastically different results between the two cases.

Right, exactly. I think the only sane thing is to always textconv or
always not textconv (whether they are identical renames or not), and any
"these are the same" optimization for identical content needs to take
into account whether we _would have_ done a different textconv (which
most of the time is going to be "no", as textconv is either not in use,
or both paths use the same diff driver; but it is not too expensive to
look up).

The diff_unmodified_pair at the top off diff_flush_patch is correct,
because it treats renames as interesting (because we have to show the
diff header, anyway). I do not know offhand if we avoid feeding
identical content to xdiff at all, but if so, we should be doing so only
after checking that the textconv filters are identical.

> Corollary to this is what should happen when you update the attributes
> between two trees so that textconv for a path that did not change
> between preimage and postimage are different.  Ideally, we should
> notice that the two converted result are different, perhaps, but I
> do not like the performance implications very much.

The content to compare cannot be different unless either the input
content changed or the path changed, and we treat either as
"interesting" in most code paths. So I do not think there are any
performance implications, except that we may need to make sure to look
up textconvs a few lines sooner in some cases.

I'll re-roll the series next week and break out the rename-optimization
bits separately so it is more obvious that it is doing the right thing.

As an aside, I also need to revisit the regex half of that code, which
is still buggy (before and after my patch, due to the expecting-a-NUL
behavior we talked about a week or two ago).  That is a separate topic,
but the same area of code.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]