"Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > From: Derrick Stolee <stolee@xxxxxxxxx> > > In anticipation of a few planned applications, introduce the most basic form > of a path-walk API. It currently assumes that there are no UNINTERESTING > objects, and does not include any complicated filters. It calls a function > pointer on groups of tree and blob objects as grouped by path. This only > includes objects the first time they are discovered, so an object that > appears at multiple paths will not be included in two batches. > > These batches are collected in 'struct type_and_oid_list' objects, which > store an object type and an oid_array of objects. > > The data structures are documented in 'struct path_walk_context', but in > summary the most important are: > > * 'paths_to_lists' is a strmap that connects a path to a > type_and_oid_list for that path. To avoid conflicts in path names, > we make sure that tree paths end in "/" (except the root path with > is an empty string) and blob paths do not end in "/". > > * 'path_stack' is a string list that is added to in an append-only > way. This stores the stack of our depth-first search on the heap > instead of using recursion. > > * 'path_stack_pushed' is a strmap that stores path names that were > already added to 'path_stack', to avoid repeating paths in the > stack. Mostly, this saves us from quadratic lookups from doing > unsorted checks into the string_list. > > The coupling of 'path_stack' and 'path_stack_pushed' is protected by the > push_to_stack() method. Call this instead of inserting into these > structures directly. > > The walk_objects_by_path() method initializes these structures and > starts walking commits from the given rev_info struct. The commits are > used to find the list of root trees which populate the start of our > depth-first search. Isn't this more of breadth-first search? Reading through the code, the algorithm seems something like: - For each commit in list of commits (from rev_info) - Tackle each root tree, add root path to the stack. - For each path in stack left - Call the callback provided by client. - Find all its first level children, add each to the stack. So wouldn't this go through the tree in level by level basis? Making it a BFS? Apart from this, the patch itself looks solid. I ended up writing a small client to play with this API, and was very pleased how quickly I could get it running. [snip]
Attachment:
signature.asc
Description: PGP signature