On Thu, Nov 17 2022, Eric Wong wrote: > Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: >> On Wed, Nov 16 2022, Eric Wong wrote: >> > Memory savings were measured using the following patch which >> > relies on a patched LD_PRELOAD-based malloc debugger: >> > https://80x24.org/spew/20221116095404.3974691-1-e@xxxxxxxxx/raw >> >> FWIW SANITIZE=leak will find this if you stick a "remote_islands = NULL" >> and run e.g. t5320-delta-islands.sh, but maybe you needed this closer to >> production. >> >> Valgrind will also work, but of course be *much* slower. > > Yeah, I run that LD_PRELOAD thing in production since it's > cheap compared to valgrind. > >> Perfect shouldn't be the enemy of the good & all that, but in this case >> it's not too much more effort to just give this data an appropriate >> lifetime instead of the global, I tried that out for just the "regex" >> part of this below. >> >> The free_remote_islands() seems to be similarly alive between >> "find_island_for_ref" and "deduplicate_islands". >> >> Your version also works, but the root cause of this sort of thing is >> these global lifetimes, which sometimes we do for a good reason, but in >> this case we don't. > > Agreed on all points. Overall, the amount of globals in git has > long seemed excessive and offputting to me (and likely other > drive-by hackers). > >> diff --git a/delta-islands.c b/delta-islands.c >> index 26f9e99e1a9..ef86a91059c 100644 >> --- a/delta-islands.c >> +++ b/delta-islands.c >> @@ -312,29 +312,41 @@ void resolve_tree_islands(struct repository *r, >> free(todo); >> } >> >> -static regex_t *island_regexes; >> -static unsigned int island_regexes_alloc, island_regexes_nr; >> +struct island_config_data { >> + regex_t *rx; >> + size_t nr; >> + size_t alloc; >> +}; > > I've added kh_str_t *remote_islands and renamed > s/island_config_data/island_load_data/ in the below version > to reflect the slightly different scope of remote_islands. > >> static const char *core_island_name; >> >> -static int island_config_callback(const char *k, const char *v, void *cb UNUSED) >> +static void island_config_data_release(struct island_config_data *icd) >> +{ >> + for (size_t i = 0; i < icd->nr; i++) >> + regfree(&icd->rx[i]); >> + free(icd->rx); >> +} > > icd => ild since config => load > >> +static int island_config_callback(const char *k, const char *v, void *cb) >> { >> + struct island_config_data *data = cb; >> + > > data => ild > > I don't like the name `data' for a typed variable. Hah! My thought process when deciding on it was "hrm, what *do* we call the two variables when we have a void * and turn it into a 'util'? data? cb? util? ... and which one was which?" I started grepping, then decided I was wasting too much time on that for a one-off reply, and just went with the first thing I found. The names you picked are a lot better :) > Aside from that, v2 below still frees the regex memory early on > in the hopes deduplicate_islands() can reuse some of the freed > regexp memory. > > Anyways, here's v2, which seems to work. I'm still trying to > figure out SATA errors+resets after replacing a CMOS battery, > but I really hope this patch isn't the cause. > > -----8<----- > From: Eric Wong <e@xxxxxxxxx> > Subject: [PATCH] delta-islands: free island-related data after use > > On my use case involving 771 islands of Linux on kernel.org, > this reduces memory usage by around 25MB. The bulk of that > comes from free_remote_islands, since free_config_regexes only > saves around 40k. > > This memory is saved early in the memory-intensive pack process, > making it available for the remainder of the long process. > > Signed-off-by: Eric Wong <e@xxxxxxxxx> > Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> This all looks good to me, thanks a lot for the follow-up.