On Tue, 1 May 2018 14:33:51 -0700 Stefan Beller <sbeller@xxxxxxxxxx> wrote: > Git's object access code can be thought of as containing two layers: > the raw object store provides access to raw object content, while the > higher level obj_hash code parses raw objects and keeps track of > parenthood and other object relationships using 'struct object'. > Keeping these layers separate should make it easier to find relevant > functions and to change the implementation of one without having to > touch the other. I only understood this after reading the code below. I think this paragraph can be removed; I don't see its relevance - of course we need to store metadata about how to load objects somewhere, and caching objects we have already loaded is a good idea: and the metadata and cache are two separate things before and after this patch anyway. > Add an object_parser field to 'struct repository' to prepare obj_hash > to be handled per repository. Callers still only use the_repository > for now --- later patches will adapt them to handle arbitrary > repositories. I think this is better reworded as: Convert the existing global cache for parsed objects (obj_hash) into repository-specific parsed object caches. Existing code that uses obj_hash are modified to use the parsed object cache of the_repository; future patches will use the parsed object caches of other repositories. > +struct object_parser *object_parser_new(void) > +{ > + struct object_parser *o = xmalloc(sizeof(*o)); > + memset(o, 0, sizeof(*o)); > + return o; > +} I'm not sure that I like this style of code (I prefer the strbuf style with _INIT and release(), which I think is more flexible) but I don't feel too strongly about it. > +struct object_parser { > + struct object **obj_hash; > + int nr_objs, obj_hash_size; > +}; object_parser is probably a bad name. I think Duy had some good suggestions in [1]. [1] https://public-inbox.org/git/CACsJy8CgX6BME=EhiDBfMRzBOYDBNHE6Ouxv4fZC-GW7Rsi81Q@xxxxxxxxxxxxxx/ > /* > - * Holds any information related to accessing the raw object content. > + * Holds any information needed to retrieve the raw content > + * of objects. The object_parser uses this to get object > + * content which it then parses. > */ > struct raw_object_store *objects; I think the additional sentence ("The object_parser uses this...") is unnecessary and confusing, especially if its identity is going to be one of storage (like "parsed_objects" implies). > + /* > + * State for the object parser. This owns all parsed objects > + * (struct object) so callers do not have to manage their > + * lifetime. > + */ > + struct object_parser *parsed_objects; Even after all the commits in this patch set, this does not store any state for parsing. Maybe just document as: All objects in this repository that have been parsed. This structure owns all objects it references, so users of "struct object *" generally do not need to free them; instead, when a repository is no longer used, call object_parser_clear() on this structure. (And maybe say that the freeing method on struct repository will automatically call object_parser_clear().)