Re: [PATCH v3 0/6] Create commit-graph file format v2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 03 2019, Derrick Stolee wrote:

> On 5/2/2019 2:02 PM, Ævar Arnfjörð Bjarmason wrote:
>>
>> But those are separate from any back-compat concerns, which is what I
>> think makes sense to focus on now.
>
> Thinking more on this topic, I think I have a way to satisfy _all_ of
> your concerns by simplifying the plan for incremental commit-graph files.
>
> My initial plan was to have the "commit-graph" file always be the "tip"
> file, and it would point to some number of "graph-{hash}.graph" files.
> Then, we would have some set of strategies to decide when we create a new
> .graph file or when we compact the files down into the "commit-graph"
> file.
>
> This has several issues regarding race conditions that I had not yet
> resolved (and maybe would always have problems).
>
> It would be much simpler to restrict the model. Your idea of changing
> the file name is the inspiration here.

I still have some questions about SHA-256, how we'll eventually change
the format of these new files etc, but those can wait...

> * The "commit-graph" file is the base commit graph. It is always
>   closed under reachability (if a commit exists in this file, then
>   its parents are also in this file). We will also consider this to
>   be "commit-graph-0".
>
> * A commit-graph-<N> exists, then we check for the existence of
>   commit-graph-<N+1>. This file can contain commits whose parents
>   are in any smaller file.
>
> I think this resolves the issue of back-compat without updating
> the file format:
>
> 1. Old clients will never look at commit-graph-N, so they will
>    never complain about an "incomplete" file.
>
> 2. If we always open a read handle as we move up the list, then
>    a "merge and collapse" write to commit-graph-N will not
>    interrupt an existing process reading that file.
>
> I'll start hacking on this model.

Just on this, consider storing them in
.git/objects/info/commit-graphs/commit-graph-<THIS-FILE'S-CHECKSUM-SHA1>,
because:

1) We can stat() the "commit-graphs" directory to see if there's any
   new/deleted ones (dir mtime changed), similar to what we do for the
   untracked cache, and can (but I don't think we do...) do for packs/*
   and objects/??/.

   As opposed to ".git/objects/info" itself which e.g. has the "packs",
   "alternates" etc. files (can still do it, but more false positives)

2) If these things are "chained" supersets of one another anyway we have
   a big caveat that we don't have with repacking the *.pack
   files.

   Those you can do in any order, as long as you write a new one it
   doesn't matter if the data is already elsewhere stored. Just repack
   however many N at a time, then unlink() all the old ones later.

   If you have commit-graph-<N>, commit-graph-<N+1> you can only munge
   them in that sequence, and you get to do a much more careful fsync()
   dance around with both the directory entry and the file itself, which
   has a much higher chance of breaking things.

   I.e. the client might see a change to "N" before "N+1", and now we're
   screwed. Dealing with this is a giant but avoidable PITA. See
   https://public-inbox.org/git/20180117184828.31816-1-hch@xxxxxx/ and
   http://blog.httrack.com/blog/2013/11/15/everything-you-always-wanted-to-know-about-fsync/

   If instead you name the files commit-graph-<MY-OWN-SHA>, and add a
   new (v1 backwards-compatible) chunk for "and the next file's SHA-1
   is..."  we can more aggressively repack these.

   Worst case the directory entries will sync in the wrong order and the
   chain will be broken, but as long as we gracefully handle the chain
   abruptly ending that's OK.

   You can also do things like rewrite the middle "[]" of a "N, [N+1,
   N+2], N+3" sequence to have a single file for what was previously
   within those brackets (again, like "repack"). You'd just need to drop
   in a new "N" that points to the replacement for "[N+1, N+2]", but
   *don't* need to touch "N+3".

   Whereas if they're numbered you need to also move "N+3", and you're
   back to juggling both fsync() on files and directory entries.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux