On 02/18/2010 06:36 AM, Junio C Hamano wrote:
Nicolas Pitre<nico@xxxxxxxxxxx> writes:
It is likely to have better performance if the buffer is small enough to
fit in the CPU L1 cache. There are two sequencial passes over the
buffer: one for the SHA1 computation, and another for the compression,
and currently they're sure to trash the L1 cache on each pass.
I did a very unscientific test to hash about 14k paths (arch/ and fs/ from
the kernel source) using "git-hash-object -w --stdin-paths" into an empty
repository with varying sizes of paranoia buffer (quarter, 1, 4, 8 and
256kB) and saw 8-30% overhead. 256kB did hurt and around 4kB seemed to be
optimal for my this small sample load.
In any case, with any size of paranoia, this hurts the sane use case
Because by mmaping + memcpying you are getting the worst of both cases:
you get a page fault per page like with mmap, and touch memory twice
like with read.
Paolo
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html