Re: RFC: Flat directory for notes, or fan-out? Both!

"Shawn O. Pearce" <spearce@xxxxxxxxxxx> · Tue, 10 Feb 2009 11:09:09 -0800

Junio C Hamano <gitster@xxxxxxxxx> wrote:
> 
> I wonder if we can solve this by introducing a local cache that is a flat
> file that looks like:
> 
>     magic number for /usr/bin/file

Don't forget a version number.  Waste 4 bytes now and its easier
to change the format in the future if we need to.

>     tree object SHA-1 the file caches
>     Number of entries in this file
>     256 fan-out offsets into this file
>     N entries of <SHA-1, SHA-1>, sorted
>     Checksum of the file itself
> 
> and use it when availble (otherwise optionally create it upon the first
> lookup).  The file can be used by mmaping it and then doing a newton
> raphson or binary search similar to the way patch-ids.c does.

Yup.  Sort of my thoughts when I was thinking about that external
index for a "git database".

I was considering a much more complex file layout though; one that
would permit editing without completely recopying the file every
time something changes.

More or less a traditional block oriented on-disk M-tree, with
copy-on-write semantics for the blocks.  This would permit us to
quickly append onto the end of the file with new updates, and then
periodically copy and flatten out the the file as necessary to
reclaim the prior dead space.

E.g.:

  magic number
  version
  [intermediate blocks ...]
  [leaf blocks...]
  root block

Writers would append modified leaf and intermediate blocks as
necessary to the end of the file, then append a new root block.

Readers would read the file tail and verify it is a root, then scan
with a traditional M-tree search algorithm.

If the root block has a "magic block header" and a strong checksum
at the tail of the block, readers can concurrently read while a
writer is appending.  Any invalid root block just means the reader
is seeing the middle of a write, or an aborted write, and should
scan backwards to locate the prior valid root.

If the root block also has a commit SHA-1 indicating which commit
that root become valid under, a reader can decide if that root
might give it answers which aren't correct for the current value of
the notes history it is reading, and scan backwards for some older
root block.  We could accelerate that by including the file offset
of the prior root block in each new root.

GC compacting the file is just a matter of write-locking the file
to block out a new writer, then traversing the current root and
copying all blocks that are reachable.

</end-hand-waving>

> I am hoping that I could eventually rewrite rerere to use something like
> this, so that rerere database can be shared, just like the way notes can
> be shared, across repositories.

Ooh, great idea.  If we could toss rerere data into something that
can be transported around, and efficiently accessed.  I like it.

-- 
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html