Re: [PATCH 09/16] documentation: add documentation for the bitmap format

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jul 7, 2013 at 2:46 AM, Jeff King <peff@xxxxxxxx> wrote:
> On Mon, Jul 01, 2013 at 11:47:32AM -0700, Colby Ranger wrote:
>
>> > But I think we are comparing
>> > apples to steaks here, Vincent is (rightfully) concerned about process
>> > startup performance, whereas our timings were assuming the process was
>> > already running.
>> >
>>
>> I did some timing on loading the reverse index for the kernel and it
>> is pretty slow (~1200ms). I just submitted a fix to do a bucket sort
>> and reduced that to ~450ms, which is still slow but much better:
>
> On my machine, loading the kernel revidx in C git is about ~830ms. I
> switched the qsort() call to a radix/bucket sort, and have it down to
> ~200ms. So definitely much better,

This is a very nice reduction. pack-objects would benefit from it even
without bitmaps. Since it doesn't require a data format change this is
a pretty harmless patch to include in Git. We may later conclude
caching the revidx is worthwhile, but until then a bucket sort doesn't
hurt. :-)

> though that still leaves a bit to be
> desired for quick commands. E.g., "git rev-list --count A..B" should
> become fairly instantaneous with bitmaps, but in many cases the revindex
> loading will take longer than it would have to simply do the actual
> traversal.

Yea, we don't know of a way around this. In a few cases the bitmap
code in JGit is slower than the naive traversal, but these are only on
small segments of history. I wonder if you could guess which algorithm
to use by looking at the offsets of A and B using the idx file. If
they are near each other in the pack, run the naive algorithm without
bitmaps and revidx. If they are farther apart assume the bitmap would
help more than traversal and use bitmap+revidx.

Working out what the correct "distance" should be before switching
algorithms is hard. A and B could be megabytes apart in the pack but A
could be B's grandparent and traversed in milliseconds. I wonder how
often that is in practice, certainly if A and B are within a few
hundred kilobytes of each other the naive traversal should be almost
instant.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]