On Wed, Sep 18, 2019 at 4:55 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > Matheus Tavares <matheus.bernardino@xxxxxx> writes: > [...] > > - textconv cache is written to the_repository's object database even for > > submodules. Should it perhaps be written to submodules' odb instead? > > You mention "is written", but that is what happens upon a cache > miss. Before the code notices a cache miss, it must be checking if > a cached result is available. In which odb is it done? Writing > that follow the miss should happen to the same one, or the cache is > not very effective. Right. Both writing and reading operations on the textconv cache always happen in the_repository's odb. > Because you would want the cache to be effective, after running "git > grep --recurse-submodules" from the top-level, when you chdir down > to the submodule and say "git grep" to dig further, the answer to > your question is most likely "yes". OK, makes sense. To keep this series simple, I think I'll try to work on this in a following patchset. > > - Considering the following call chain: grep_source_load_driver() > > > userdiff_find_by_path() > git_check_attr() > collect_some_attrs() > > > prepare_attr_stack() > bootstrap_attr_stack(): > > > > * The last function tries to read the attributes from the > > .gitattributes and .git/info/attributes files of the_repository. > > However, for paths inside the submodule, shouldn't it try to read > > these files from the submodule? > > Yes, I think all of those would have worked correctly if we forked a > separate "git grep" inside submodule repository, but in the rush to > "do everything in process", many things like these are not done > correctly. As there is only one attribute cache IIUC, invalidating > the whole cache for the top-level and replacing it with the one for > a submodule, every time we cross the module boundary, would probably > have a negative effect on the performance, and I am not sure what > would happen if you run more than one threads working in different > repositories (i.e. top-level and submodules). Hmm, I may have gotten a little confused here. Are you talking about the attributes stack (which contains .gitattributes and info/attributes)? If so, isn't this stack already rebuild for every path? I mean, by the previous call chain it seems to me that at least these two files are reread for every path. > > * This function will also call: read_attr() > read_attr_from_index() > > > read_blob_data_from_index() which might, in turn, call > > read_object_file(). Shouldn't we pass the subrepo to it so that it > > can call repo_read_object_file()? (Again, for paths inside the > > submodule, read_object_file() won't be able to find the object as > > we won't be adding to alternates anymore.) Just as a side note, I noticed another problem with this: in a submodule with a non-checked-out .gitattributes, `git grep --textconv` will successfully retrieve the file from the index. But running `git grep --recurse-submodules --textconv` from the superproject won't. The problem is that read_blob_data_from_index() doesn't have the repository struct and thus cannot strip the subrepo_prefix from the path (so it won't find it in the subrepo's index). I'll try to fix this in this series as well.