Re: serious performance issues with images, audio files, and other "non-code" data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 18, 2010 at 09:10:58PM +0200, Sverre Rabbelier wrote:

> On Tue, May 18, 2010 at 21:07, Jeff King <peff@xxxxxxxx> wrote:
> > No, not to my knowledge. Even the "binary" attribute just says "this
> > file is binary, don't text diff it". I think we will always still do
> > rewrite-detection for operations like "git status" and the diff summary
> > of "git commit".
> 
> Would that not be a very sensible optimization that would help John
> (and other users of big files) a lot?

It might help some, but I worry about overloading the meaning of
"-delta". Right now it has a very clear meaning: don't delta for
packfiles. But that doesn't mean I might not want to see break detection
(or inexact rename detection, for that matter) at some time.

Large binary files shouldn't be taxing on regular diffs.  If you have
marked a file as "binary" and we are not creating a binary diff (i.e.,
just printing "binary files differ"), then we shouldn't even need to
pull the blob from storage (since we can tell from the sha1 that it is
different). I haven't checked to see if we do that simple optimization
(if you haven't marked it with a binary attribute, then obviously we do
have to look at the blob to find out that it is binary).

So:

  1. I think it would need a separate attribute that is about diffing
     (possibly even just options to a custom diff filter).

  2. I am not clear exactly what options would work best. Do you want to
     disable diffing entirely? Disable just inexact rename detection and
     break detection? If break detection is disabled, do you assume it
     is _always_ a rewrite, or never?

So I am open to the idea, but I think we would need a more concrete
proposal and some timings to show how it is a benefit.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]