Junio C Hamano <gitster@xxxxxxxxx> wrote: > > I wonder if we can solve this by introducing a local cache that is a flat > file that looks like: > > magic number for /usr/bin/file Don't forget a version number. Waste 4 bytes now and its easier to change the format in the future if we need to. > tree object SHA-1 the file caches > Number of entries in this file > 256 fan-out offsets into this file > N entries of <SHA-1, SHA-1>, sorted > Checksum of the file itself > > and use it when availble (otherwise optionally create it upon the first > lookup). The file can be used by mmaping it and then doing a newton > raphson or binary search similar to the way patch-ids.c does. Yup. Sort of my thoughts when I was thinking about that external index for a "git database". I was considering a much more complex file layout though; one that would permit editing without completely recopying the file every time something changes. More or less a traditional block oriented on-disk M-tree, with copy-on-write semantics for the blocks. This would permit us to quickly append onto the end of the file with new updates, and then periodically copy and flatten out the the file as necessary to reclaim the prior dead space. E.g.: magic number version [intermediate blocks ...] [leaf blocks...] root block Writers would append modified leaf and intermediate blocks as necessary to the end of the file, then append a new root block. Readers would read the file tail and verify it is a root, then scan with a traditional M-tree search algorithm. If the root block has a "magic block header" and a strong checksum at the tail of the block, readers can concurrently read while a writer is appending. Any invalid root block just means the reader is seeing the middle of a write, or an aborted write, and should scan backwards to locate the prior valid root. If the root block also has a commit SHA-1 indicating which commit that root become valid under, a reader can decide if that root might give it answers which aren't correct for the current value of the notes history it is reading, and scan backwards for some older root block. We could accelerate that by including the file offset of the prior root block in each new root. GC compacting the file is just a matter of write-locking the file to block out a new writer, then traversing the current root and copying all blocks that are reachable. </end-hand-waving> > I am hoping that I could eventually rewrite rerere to use something like > this, so that rerere database can be shared, just like the way notes can > be shared, across repositories. Ooh, great idea. If we could toss rerere data into something that can be transported around, and efficiently accessed. I like it. -- Shawn. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html