On Thu, Mar 01 2018, Nguyễn Thái Ngọc Duy jotted: > pack-objects could be a big memory hog especially on large repos, > everybody knows that. The suggestion to stick a .keep file on the > largest pack to avoid this problem is also known for a long time. > > Let's do the suggestion automatically instead of waiting for people to > come to Git mailing list and get the advice. When a certain condition > is met, gc --auto create a .keep file temporary before repack is run, > then remove it afterward. > > gc --auto does this based on an estimation of pack-objects memory > usage and whether that fits in one third of system memory (the > assumption here is for desktop environment where there are many other > applications running). > > Since the estimation may be inaccurate and that 1/3 threshold is > arbitrary, give the user a finer control over this mechanism as well: > if the largest pack is larger than gc.bigPackThreshold, it's kept. This is very promising. Saves lots of memory on my ad-hoc testing of adding a *.keep file on an in-house repo. > + if (big_pack_threshold) > + return pack->pack_size >= big_pack_threshold; > + > + /* First we have to scan through at least one pack */ > + mem_want = pack->pack_size + pack->index_size; > + /* then pack-objects needs lots more for book keeping */ > + mem_want += sizeof(struct object_entry) * nr_objects; > + /* > + * internal rev-list --all --objects takes up some memory too, > + * let's say half of it is for blobs > + */ > + mem_want += sizeof(struct blob) * nr_objects / 2; > + /* > + * and the other half is for trees (commits and tags are > + * usually insignificant) > + */ > + mem_want += sizeof(struct tree) * nr_objects / 2; > + /* and then obj_hash[], underestimated in fact */ > + mem_want += sizeof(struct object *) * nr_objects; > + /* > + * read_sha1_file() (either at delta calculation phase, or > + * writing phase) also fills up the delta base cache > + */ > + mem_want += delta_base_cache_limit; > + /* and of course pack-objects has its own delta cache */ > + mem_want += max_delta_cache_size; I'm not familiar enough with this part to say, but isn't this assuming a lot about the distribution of objects in a way that will cause is not to repack in some pathological cases? Probably worth documenting... > + /* Only allow 1/3 of memory for pack-objects */ > + mem_have = total_ram() / 3; Would be great to have this be a configurable variable, so you could set it to e.g. 33% (like here), 50% etc.