Re: What's cooking in git.git (topics)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff King <peff@xxxxxxxx> writes:

> On Tue, Oct 02, 2007 at 01:08:20AM -0400, Jeff King wrote:
>
>> One approach which I haven't tried but might be promising is to actually
>> keep each list sorted, and then do a "merge" of the two lists, comparing
>> as you go. We don't really need to do arbitrary lookups in the hash; we
>> just need to compare two hash tables at a time. My approach was to be
>> simple, but have O(HASH_SIZE) comparisons (where HASH_SIZE is on the
>> order of 2^17), and that's clearly just too big. But with a list merge,
>> it should be O(n), where n is the actual number of lines in the files
>> (or binary chunks for the binary case).
>
> BTW, I don't want to steal credit for this idea...it comes from thinking
> about what David Kastrup said earlier in the thread, though I think he
> was proposing sorting just inside buckets.

Yes: my proposal was about a microoptimization: work with the
basically existing data structures and put the already contained
information to best use.

I have not actually looked at the actual task that the structures are
going to be used in, and whether "reusing" the information is likely
to be worth the trouble.

When we are talking about buzzword compliance, "keep sorted" with the
meaning of "maintain sorted across modifications" has an O(n^2) or at
least O(nm) ring to it.  However, if it is possible to sort it just
once, and then then only merge with other lists...

I am actually quite a fan of merge sort and have even posted a small
and quite efficient version to this list once.  However, merge sorts
were really greatest at the time when cache memory was unusual to
have.  Nowadays, quicksort or similar could be faster due to better
locality of memory accesses.  I think the glibc qsort more or less
uses an array-based merge into a separate memory area (unless it runs
out of memory in which case it resorts to regular quicksort).

-- 
David Kastrup

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux