Junio C Hamano <gitster@xxxxxxxxx> writes: > David Kastrup <dak@xxxxxxx> writes: > >> The previous implementation uses a sorted linear list of struct >> blame_entry in a struct scoreboard for organizing all partial or >> completed work. Every task that is done requires going through the >> whole list where most entries are not relevant to the task at hand. >> >> This commit reorganizes the data structures in order to have each >> remaining subtask work with its own sorted linear list it can work off >> front to back. Subtasks are organized into "struct origin" chains >> hanging off particular commits. Commits are organized into a priority >> queue, processing them in commit date order in order to keep most of >> the work affecting a particular blob collated even in the presence of >> an extensive merge history. In that manner, linear searches can be >> basically avoided anywhere. They still are done for identifying one >> of multiple analyzed files in a given commit, but the degenerate case >> of a single large file being assembled from a multitude of smaller >> files in the past is not likely to occur often enough to warrant >> special treatment. >> --- > > Sign-off? Not while this is not fit for merging because of #if 0 etc and missing functionality. This is just for review. > Actually, I'd like to take my previous suggestion to add this as > blame2 (to replace blame in the future) back. That was based on my > fear/hope to see an implementation based on a completely different > approach, but the basic premise of having one blame_entry record per > the lines of the final image in the scoreboard, using diff between > parents to the child to find common lines for passing the blame up, > etc. have not changed at all and the change is all about organizing > the pieces of data in a *much* *better* way to avoid needlessly > finding what we already have computed. Yes. Basically, the call graph outside of blame.c itself should be pretty much the same. > Style; please make /* and */ sit on their own line without anything > else in multi-line comments. Will do. >> + struct origin *next; >> struct commit *commit; >> + /* `suspects' contains blame entries that may be attributed to >> + * this origin's commit or to parent commits. When a commit >> + * is being processed, all suspects will be moved, either by >> + * assigning them to an origin in a different commit, or by >> + * shipping them to the scoreboard's ent list because they >> + * cannot be attributed to a different commit. >> + */ >> + struct blame_entry *suspects; >> mmfile_t file; >> unsigned char blob_sha1[20]; >> unsigned mode; >> + /* shipped gets set when shipping any suspects to the final >> + * blame list instead of other commits >> + */ >> + char shipped; > > Unused? More like "added, forgotten, added under yet another name": >> + char guilty; I'll have to decide which one to keep. >> + /* Should be present exactly once in commit chain */ >> + for (p = o->commit->util; p; l = p, p = p->next) { >> + if (p == o) >> + { > > Style; please have { sit with the control structure that opens the > block, unless it is the opening brace of the function body or > struct/union definition. Ok. >> +static struct blame_entry * >> +blame_merge(struct blame_entry *list1, >> + struct blame_entry *list2) > > Style; we do not do GNU-ish "function name begins a line". Even > though I personally find it easier to use for things like 'grep', > that is not the style we use around here. Ok. I'm also certain to have a few "space between function name and arguments" left and will grep for those before submitting the next version. Next version will at least include -M option, possibly leaving -C for later. -- David Kastrup -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html