On Fri, Sep 30, 2016 at 10:54:16AM -0700, Linus Torvalds wrote: > On Fri, Sep 30, 2016 at 1:06 AM, Jeff King <peff@xxxxxxxx> wrote: > > > > I agree that this deals with the performance concerns by caching the > > default_abbrev_len and starting there. I still think it's unnecessarily > > invasive to touch get_short_sha1() at all, which is otherwise only a > > reading function. > > So the reason that d oesn't work is that the "disambiguate_state" data > where we keep the number of objects is only visible within > get_short_sha1(). > > So outside that function, you don't have any sane way to figure out > how many objects. So then you have to do the extra counting function.. Right. I think you should do the extra counting function. It's a few more lines, but the design is way less tangled. > > So IMHO, the best combination is the init_default_abbrev() you posted in > > [1], but initialized at the top of find_unique_abbrev(). And cached > > there, obviously, in a similar way. > > That's certainly possible, but I'm really not happy with how the > counting function looks. And nobody actually stood up to say "yeah, > that gets alternate loose objects right" or "if you have tons of those > alternate loose objects you have other issues anyway". I think > somebody would have to "own" that counting function, the advantage of > just putting it into disambiguate_state is that we just get the > counting for free.. I don't think you _need_ get the alternate loose objects right. In fact, I don't think you need to care about loose objects at all. For the scales we're talking about, they're a rounding error. I would have done it like this: diff --git a/sha1_file.c b/sha1_file.c index 65deaf9..1845502 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1382,6 +1382,32 @@ static void prepare_packed_git_one(char *objdir, int local) strbuf_release(&path); } +static int approximate_object_count_valid; + +/* + * Give a fast, rough count of the number of objects in the repository. This + * ignores loose objects completely. If you have a lot of them, then either + * you should repack because your performance will be awful, or they are + * all unreachable objects about to be pruned, in which case they're not really + * interesting as a measure of repo size in the first place. + */ +unsigned long approximate_object_count(void) +{ + static unsigned long count; + if (!approximate_object_count_valid) { + struct packed_git *p; + + prepare_packed_git(); + count = 0; + for (p = packed_git; p; p = p->next) { + if (open_pack_index(p)) + continue; + count += p->num_objects; + } + } + return count; +} + static void *get_next_packed_git(const void *p) { return ((const struct packed_git *)p)->next; @@ -1456,6 +1482,7 @@ void prepare_packed_git(void) void reprepare_packed_git(void) { + approximate_object_count_valid = 0; prepare_packed_git_run_once = 0; prepare_packed_git(); }