Hi, On Mon, 2 Jun 2008, Geoffrey Irving wrote: > On Sun, Jun 1, 2008 at 11:42 PM, Jeff King <peff@xxxxxxxx> wrote: > > On Mon, Jun 02, 2008 at 07:13:14AM +0100, Johannes Schindelin wrote: > > > >> I do not think that this "read-the-entire-table-into-memory" paradigm > >> is a wise choice. mmap()ing, I would have understood, but reading a > >> potentially pretty large table into memory? > > > > When I was just a git-youth, I wrote a fast mmap-based cache for > > storing SHA1 pairs. It might give some direction. You should be able > > to find it here: > > > > http://mid.gmane.org/20060629035849.GA30749@xxxxxxxxxxxxxxxxxxxxxxx > > > > It mmaps and binary searches a sorted list. New entries are added to > > an in-memory list, and then at the end of a run, the two sorted lists > > are merged to create the new on-disk version. > > I don't need sorting (and neither did you), so I think a hash table is > better (O(1) instead of O(log n), and we don't even need to compute hash > keys. I'll leave it up to you and Dscho (or anyone else who cares to > chime in) which one you think I should do. My tests suggested that the lookup time advantage of hashes makes them a more appropriate choice than sorted lists, if you look up often, but add rarely. Another issue that just hit me: this cache is append-only, so if it grows too large, you have no other option than to scratch and recreate it. Maybe this needs porcelain support, too? (git gc?) Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html