Taylor Blau <me@xxxxxxxxxxxx> wrote: > What is the purpose of the GIT_READ_COMMIT_TABLE environment variable? I > assume that it's to make it easier to run tests (especially performance > ones) with and without access to the lookup table. If so, we should > document that (lightly) in the commit message, and rename this to be > GIT_TEST_READ_COMMIT_TABLE to indicate that it shouldn't be used outside > of tests. This is mainly for testing, GIT_TEST_READ_COMMIT_TABLE is perfect. > All makes sense. Some light documentation might help explain what this > comparator function is used for (the bsearch() call below in > bitmap_table_lookup()), although I suspect that this function will get > slightly more complicated if you pack the table contents as I suggest, > in which case more documentation will definitely help. Ok. > Interesting; this is a point that I forgot about from the original > patch. xor_pos is an index (not an offset) into the list of commits in > the table of contents in the order appear in that table. We should be > clear about (a) what that order is, and (b) that xor_pos is an index > into that order. This is exactly what I said in my first reply. I made a mistake here. (1) As xor_pos is relative to the current bitmap, it depends on the bitmap entry order. These two order are not same. One is ordered by date, another is lexicographically ordered. I will fix it. > Yeah. The problem here is that we can't record every commit that > _doesn't_ have a bitmap every time we return NULL from one of these > queries, since there are arbitrarily many such commits that don't have > bitmaps. > > We could approximate it using a Bloom filter or something, and much of > that code is already written and could be interesting to try and reuse. > But I wonder if we could get by with something simpler, though, which > would cause us to load all bitmaps from the lookup table after a fixed > number of cache misses (at which point we should force ourselves to load > everything and just read everything out of a single O(1) lookup in the > stored bitmap table). > > That may or may not be a good idea, and the threshold will probably be > highly dependent on the system. So it may not even be worth it, but I > think it's an interesting area to experiemnt in and think a little more > about. Now I got the point. I wonder what if we leave it as it is. How much will it affect the code? > How does this commit_pos work again? I confess I have forgetten since I > wrote some of this code a while ago... :-). It is using recursive strategy. The first call to `stored_bitmap_for_commit` function do not have `pos_hint`. So, it uses `bitmap_table_lookup` to find the commit position in the list and makes a call to `lazy_bitmap_for_commit` function. This function gets the offset and xor-offset using the commit id's position in the list. If xor-offset exists, it is using this xor-offset to get the xor-bitmap by calling `stored_bitmap_for_commit` again. But this time `pos_hint` is xor-offset. This goes on till the last non-xor bitmap has found. As I said before, xor-offset should be an absolute value to make it work correctly. > Should we print this regardless of whether or not there is a lookup > table? We should be able to learn the entry count either way. No, this is necessary. "Bitmap v1 test (%d entries loaded)" means all the bitmap entries has been loaded. It is basically for `load_bitmap_entries_bitmap_v1` function which loads all the bitmaps One by one. But if there is a lookup table, `prepare_bitmap_git` function will not load every entries and thus printing the above line is wrong. Thanks :)