Re: [PATCH 2/2] fill_textconv(): Don't get/put cache if sha1 is not valid

Kirill Smelkov <kirr@xxxxxxxxxxxxxxxxxxx> · Sat, 18 Dec 2010 23:55:15 +0300

On Sat, Dec 18, 2010 at 11:13:37AM -0500, Jeff King wrote:
> On Sat, Dec 18, 2010 at 05:54:12PM +0300, Kirill Smelkov wrote:
> 
> > It turned out, under blame there are requests to fill_textconv() with
> > sha1=0000000000000000000000000000000000000000 and sha1_valid=0.
> > 
> > As the code did not analyzed sha1 validity, we ended up putting 000000
> > into textconv cache which was fooling later blames to discover lots of
> > lines in 'Not Yet Committed' state.
> > [...]
> > -	if (driver->textconv_cache) {
> > +	if (driver->textconv_cache && df->sha1_valid) {
> >  		*outbuf = notes_cache_get(driver->textconv_cache, df->sha1,
> >  					  &size);
> 
> In short:
> 
>   Acked-by: Jeff King <peff@xxxxxxxx>
> 
> But it took some thinking to convince myself, so the long answer is
> below if anyone cares.
> 
> I was dubious at first that this could be the right solution. We still
> end up putting the filespec through run_textconv, which didn't seem
> right if it is not valid.
> 
> But reading into it more, there are two levels of invalidity:
> 
>   1. !DIFF_FILE_VALID(df) - we are not a valid file at all. I.e., we are
>      /dev/null.
> 
>   2. !df->sha1_valid - we are pointing to a working tree file whose sha1
>      we don't know
> 
> I think level (2) never happens at all in the regular diff code, which
> is why this case was completely unhandled. But it is OK in that case
> (required, even) to put the contents through run_textconv.
> 
> In theory we could actually calculate the sha1 in case (2) and cache
> under that, but I don't know how much it would buy us in practice. It
> saves us running the textconv filter at the expense of computing the
> sha1. Which is probably a win for most filters, but on the other hand,
> it is the wrong place to compute such a sha1. If it is a working tree
> file, we should ideally update our stat info in the index so that the
> info can be reused.

Jeff,

Thanks for your ACK and for the explanation.

My last patches to git were blame related so semi-intuitively I knew
that invalid sha1 are coming from files in worktree. Your description
makes things much more clear and I'd put it into patch log as well.
What is the best practice for this? For me to re-roll, or for Junio to
merge texts?

Also experimenting shows that, as you say, regular diff does not have this
problem, and also that diff calculates sha1 for files in worktree and so
their textconv results are cached. So clearly, there are some behaviour
differences between diff and blame.

Thanks again,
Kirill
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html