Re: [PATCH] xdiff-interface.c (buffer_is_binary): Remove buffer size limitation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Mon, 3 Dec 2007, Linus Torvalds wrote:

> On Tue, 4 Dec 2007, Dmitry V. Levin wrote:
> >
> > Average file size in the linux-2.6.23.9 kernel tree is 10944 bytes,
> 
> Don't do "average" sizes. That's an almost totally meaningless number.
> 
> "Average" makes sense if you have some kind of gaussian distribution or 
> similar.

To enhance on that: Gaussian is symmetric, which cannot be the proper 
distribution for anything that is non-negative.

I see so many mis-applications of statistics/probability theory in my day 
job that I cannot resist pointing people to the Poisson distribution here 
(in whose context "average" actually makes kind of sense).

But back to the problem: if you have a truly binary file, then _every_ 
byte (absent further information, of course) has a probability of 1/256 of 
being 0.

Which means that if a file is binary, but is unusual enough to have that 
property only for half of the first 8192 bytes, you get a probability of 
1 - 1 / 256^4096 = 1 - 1 / 2 ^ 32768 that the current test succeeds.

I fail to see how this test can possibly fail for the average case.

So if it fails only for special cases, we are probably (in the common, not 
the mathematical, sense) better off asking those people encountering them 
to add git-attributes for the files.

IMHO that is not asking for too much.

Ciao,
Dscho

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux