Yesterday while down a rabbit hole, I discovered an interesting thing about the 'omitted' object in many filters in list-objects-filter-options.h. It appears that when we call list-objects.h:traverse_commit_list_filtered() with a non-NULL 'omitted' argument, we still perform a walk of all objects - that is, the filtered walk is no more efficient than the unfiltered walk from the same place via 'traverse_commit_list()'. I verified by calling each and counting the objects: 161 if (0) { 162 /* Unfiltered: */ 163 printf(_("Unfiltered object walk.\n")); 164 traverse_commit_list(rev, walken_show_commit, 165 walken_show_object, NULL); 166 } else { 167 printf(_("Filtered object walk with filterspec 'tree:1'.\n")); 168 /* 169 * We can parse a tree depth of 1 to demonstrate the kind of 170 * filtering that could occur eg during shallow cloning. 171 */ 172 parse_list_objects_filter(&filter_options, "tree:1"); 173 174 traverse_commit_list_filtered(&filter_options, rev, 175 walken_show_commit, walken_show_object, NULL, &omitted); 176 } 177 178 /* Count the omitted objects. */ 179 oidset_iter_init(&omitted, &oit); 180 181 while ((oid = oidset_iter_next(&oit))) 182 omitted_count++; 183 184 printf(_("Object walk completed. Found %d commits, %d blobs, %d tags, " 185 "and %d trees; %d omitted objects.\n"), commit_count, 186 blob_count, tag_count, tree_count, omitted_count); I found that omitted_count was always equal to the difference between sum(blob_count, tag_count, tree_count, commit_count) in the unfiltered and filtered walks. I also found that the length of time required to perform the unfiltered walk and the filtered-with-non-NULL-omitted walk was the same, while the time required to perform the filtered walk with NULL omitted was significantly shorter. (The walk in question was over the latest release of Git master, plus the ten or so commits in my feature branch.) I was surprised! I figured that with filter "tree:1" that "omitted" would contain only the objects on the "border" of the filter - that is, I assumed it would contain the blobs and trees in the root git dir, but none of those trees' blobs and trees. After talking with jrnieder at length, it sounds like neither of us were clear on why this "omitted" list would be needed beyond the initial development stage of a new filter... Jonathan's impression was also that if we do need the "omitted" list, it may be a bug that we're still traversing objects which are only reachable from objects already omitted. I grepped the Git source and found that we only provide a non-NULL "omitted" when someone calls "git rev-list --filter-print-omitted", which we verify with a simple test case for "blobs:none", in which case the "border" objects which were omitted must be the same as all objects which were omitted (since blobs aren't pointing to anything else). I think if we had written a similar test case with some trees we expect to omit we might have noticed sooner. Since I was already in the rabbit hole, out of curiosity I did a search on Github and only found one project referencing --filter-print-omitted which wasn't a mirror of Git: https://github.com/search?l=Python&q=%22--filter-print-omitted%22&type=Code So, what do we use --filter-print-omitted for? Is anybody needing it? Or do we just use it to verify this one test case? Should we fix it, or get rid of it, or neither? - Emily