Sure, but the data buffers aren't necessarily so. At least some
code to
align index_step with actual memory offset if necessary should be
considered.
I don't quite understand. Do you mean that if I mmap a file, I can't
count on the memory being word-aligned? Note that I traverse the
target buffer byte-by-byte, but the matches in the source buffer are
always aligned to index_step. Doing a word load there instead of
individual byte loads actually is a significant speedup on PPC,
and most other non-Intel platforms.
Indeed. And since the primary goal for GIT is to manage relatively
small files with relatively few differences then we have to
optimize for
that case while trying to simply limit the dammage in the other cases.
Yes, I'll look at those now. Also, comparing one index against
10 target files may change the profile quite a bit. The new
algorithm spends most time on indexing, but when comparing against
many files, the find_copy part suddenly becomes dominant.
Well, I did lots of benchmarks too over the Linux kernel repository
with the current code. It is of course a dataset quite different from
two large files. And simply increasing the hash size to improve on
pack
size did increase CPU usage quite significantly.
BTW, I still get:
potomac%git-rev-list --objects
e64961b0573b0e72bd55eab6d36bd97f859f9516 | ./git-pack-objects --no-
reuse-delta --stdout
Generating pack...
Done counting 17005 objects.
Deltifying 17005 objects.
100% (17005/17005) done
fatal: delta size changed
(This is for my git.git tree.)
PS. Somehow your code had "double line spacing" :)
Gah. Your original version must have CRLF line terminations, and vi
simply notice that and writes the file back with CRLF by default.
Find
attached a version with those converted to LF only.
Strange, my platform doesn't use CRLF, and my sources all have
pristine LF line terminations.
-Geert
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html