Re: [PATCH v4 03/10] commit-graph: compute generation numbers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/29/2018 5:08 AM, Jakub Narebski wrote:
Derrick Stolee <dstolee@xxxxxxxxxxxxx> writes:

While preparing commits to be written into a commit-graph file, compute
the generation numbers using a depth-first strategy.
Sidenote: for generation numbers it does not matter if we use
depth-first or breadth-first strategy, but it is more natural to use
depth-first search because generation numbers need post-order processing
(parents before child).

The only commits that are walked in this depth-first search are those
without a precomputed generation number. Thus, computation time will be
relative to the number of new commits to the commit-graph file.
A question: what happens if the existing commit graph is from older
version of git and has _ZERO for generation numbers?

Answer: I see that we treat both _INFINITY (not in commit-graph) and
_ZERO (in commit graph but not computed) as not computed generation
numbers.  All right.

If a computed generation number would exceed GENERATION_NUMBER_MAX, then
use GENERATION_NUMBER_MAX instead.
All right, though I guess this would remain theoretical for a long
while.

We don't have any way of testing this, at least not without recompiling
Git with lower value of GENERATION_NUMBER_MAX -- which means not
automatically, isn't it?

Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
---
  commit-graph.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
  1 file changed, 45 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 9ad21c3ffb..047fa9fca5 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -439,6 +439,9 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
  		else
  			packedDate[0] = 0;
+ if ((*list)->generation != GENERATION_NUMBER_INFINITY)
+			packedDate[0] |= htonl((*list)->generation << 2);
+
If we stumble upon commit marked as "not in commit-graph" while writing
commit graph, it is a BUG(), isn't it?

(Problem noticed by Junio.)

Since we are computing the values for all commits in the list, this condition is not important and will be removed.


It is a bit strange to me that the code uses get_be32 for reading, but
htonl for writing.  Is Git tested on non little-endian machines, like
big-endian ppc64 or s390x, or on mixed-endian machines (or
selectable-endian machines with data endianness set to non
little-endian, like ia64)?  If not, could we use for example openSUSE
Build Service (https://build.opensuse.org/) for this?

Since we are packing two values into 64 bits, I am using htonl() here to arrange the 30-bit generation number alongside the 34-bit commit date value, then writing with hashwrite(). The other 32-bit integers are written with hashwrite_be32() to avoid translating this data in-memory.


  		packedDate[1] = htonl((*list)->date);
  		hashwrite(f, packedDate, 8);
@@ -571,6 +574,46 @@ static void close_reachable(struct packed_oid_list *oids)
  	}
  }
+static void compute_generation_numbers(struct commit** commits,
+				       int nr_commits)
+{
+	int i;
+	struct commit_list *list = NULL;
All right, commit_list will work as stack.

+
+	for (i = 0; i < nr_commits; i++) {
+		if (commits[i]->generation != GENERATION_NUMBER_INFINITY &&
+		    commits[i]->generation != GENERATION_NUMBER_ZERO)
+			continue;
All right, we consider _INFINITY and _SERO as not computed.  If
generation number is computed (by 'recursion' or from commit graph), we
(re)use it.  This means that generation number calculation is
incremental, as intended -- good.

+
+		commit_list_insert(commits[i], &list);
Start depth-first walks from commits given.

+		while (list) {
+			struct commit *current = list->item;
+			struct commit_list *parent;
+			int all_parents_computed = 1;
Here all_parents_computed is a boolean flag.  I see that it is easier to
start with assumption that all parents will have computed generation
numbers.

+			uint32_t max_generation = 0;
The generation number value of 0 functions as sentinel; generation
numbers start from 1.  Not that it matters much, as lowest possible
generation number is 1, and we could have started from that value.

Except that for a commit with no parents, we want it to receive generation number max_generation + 1 = 1, so this value of 0 is important.


+
+			for (parent = current->parents; parent; parent = parent->next) {
+				if (parent->item->generation == GENERATION_NUMBER_INFINITY ||
+				    parent->item->generation == GENERATION_NUMBER_ZERO) {
+					all_parents_computed = 0;
+					commit_list_insert(parent->item, &list);
+					break;
If some parent doesn't have generation number calculated, we add it to
stack (and break out of loop because it is depth-first walk), and mark
this situation.  All right.

+				} else if (parent->item->generation > max_generation) {
+					max_generation = parent->item->generation;
Otherwise, update max_generation.  All right.

+				}
+			}
+
+			if (all_parents_computed) {
+				current->generation = max_generation + 1;
+				pop_commit(&list);
+			}
+
+			if (current->generation > GENERATION_NUMBER_MAX)
+				current->generation = GENERATION_NUMBER_MAX;
This conditional should be inside all_parents_computed test, for example
like this:

   +			if (all_parents_computed) {
   +				current->generation = max_generation + 1;
   +				if (current->generation > GENERATION_NUMBER_MAX)
   +					current->generation = GENERATION_NUMBER_MAX;
   +
   +				pop_commit(&list);
   +			}

(Noticed by Junio.)

Sidenote: when we revisit the commit, returning from depth-first walk of
one of its parents, we calculate max_generation from scratch again.
This does not matter for performance, as it's just data access and
calculating maximum - any workaround to not restart those calculations
would take more time and memory.  And it's simple.

+		}
+	}
+}
+
  void write_commit_graph(const char *obj_dir,
  			const char **pack_indexes,
  			int nr_packs,
@@ -694,6 +737,8 @@ void write_commit_graph(const char *obj_dir,
  	if (commits.nr >= GRAPH_PARENT_MISSING)
  		die(_("too many commits to write graph"));
+ compute_generation_numbers(commits.list, commits.nr);
+
Nice and simple.  All right.

I guess that we do not pass "struct packed_commit_list commits" as
argument to compute_generation_numbers instead of "struct commit**
commits.list" and "int commits.nr" to compute_generation_numbers() to
keep the latter nice and generic?

Good catch. There is no reason to not use packed_commit_list here.


  	graph_name = get_commit_graph_filename(obj_dir);
  	fd = hold_lock_file_for_update(&lk, graph_name, 0);
Best,




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux