Hello, On Sat, 5 Sep 2020 at 19:38, Taylor Blau <me@xxxxxxxxxxxx> wrote: > On Sat, Sep 05, 2020 at 07:22:08PM +0200, Jakub Narębski wrote: > > Taylor Blau <me@xxxxxxxxxxxx> writes: > > > > How do you distinguish between no changed paths stored because there > > were no changes (which should not count as *_found_large), and no > > changed paths stored because there were too many changes? If I remember > > it correctly in current implementation both are represented as > > zero-length filter (no changed paths could have been represented as all > > zeros filter, too many changed paths could have been represented as all > > ones filter). > > Right, once we get handed back a filter from > 'get_or_compute_bloom_filter()', we can't distinguish between (a) a > commit with too many changes to store in a single Bloom filter, and (b) > a commit with no changes at all. We could change how we store either no-changes Bloom filter (as all zeros minimal size filter), or too-many-changes Bloom filter (as all ones, i.e. max unsigned value, minimal size filter). This change would not require to change any user of Bloom filter. > It's unfortunate that callers can't pick between the two, but this > implementation is actually an improvement on the status-quo! Why? > Because right now we'll see an "empty" Bloom filter and recompute it > because it's "missing", only to discover that it has no changes. > > With this patch, we'll say "this filter looks too large", and stop > computing it, because we have already gone through the effort to compute > it once (and marked it in the BFXL chunk). Can we use this when computing trace2 values? [...] > > This is a minor issue, though. > > Thanks for raising it. I don't think that this is a show-stopper for > this series. I agree. Best, -- Jakub Narębski