On Fri, May 04 2018, Jakub Narebski wrote: (Just off-the cuff here and I'm surely about to be corrected by Derrick...) > * What to do about merge commits, and octopus merges in particular? > Should Bloom filter be stored for each of the parents? How to ensure > fast access then (fixed-width records) - use large edge list? You could still store it fixed with, you'd just say that if you encounter a merge with N parents the filter wouldn't store files changed in that commit, but rather whether any of the N (including the merge) had changes to files as of the their common merge-base. Then if they did you'd need to walk all sides of the merge where each commit would also have the filter to figure out where the change(s) was/were, but if they didn't you could skip straight to the merge base and keep walking. Which, on the topic of what else a commit graph could store: A mapping from merge commits of N parents to the merge-base of those commits. You could also store nothing for merges (or only files the merge itself changed v.s. its parents). Derrick talked about how the bloom filter implementation has a value that's "Didn't compute (for whatever reason), look at it manually". > * Then there is problem of rename and copying detection - I think we can > simply ignore it: unless someone has an idea about how to handle it? > > Though this means that "git log --follow <file>" wouldn't get any > speedup, and neither the half of "git gui blame" that runs "git blame > --incremental -C -C -w" -- the one that allows code copying and > movement detection. Couldn't the bloom filter also speed up --follow if you did two passes through the history? The first to figure out all files that ever changed names, and then say you did `--follow sha1-name.c` on git.git. The filter would have had all the bits for both sha1_name.c and sha1-name.c set on all commits that touched either for all of the history. Of course this would only work with a given default value of -M<n>, but on the assumption that most users left it at the default, and furthermore that renames weren't so common as to make the filter useless with too many false-positives as a result, it might be worth it. If you