On Wed, Sep 01 2021, Jeff King wrote: > On Tue, Aug 31, 2021 at 02:47:01PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> > That works, but now "util" is not available for all the _other_ uses for >> > which it was intended. And if we're not using it for those other uses, >> > then why does it need to exist at all? If we are only using it to hold >> > the allocated string pointer, then shouldn't it be "char *to_free"? >> >> Because having it be "char *" doesn't cover the common case of >> e.g. getting an already allocated "struct something *" which contains >> your string, setting the "string" in "struct string_list_item" to some >> string in that struct, and the "util" to the struct itself, as we now >> own it and want to free() it later in its entirety. > > OK. I buy that storing a void pointer makes it more flexible. I'm not > altogether convinced this pattern is especially common, but it's not any > harder to work with than a "need_to_free" flag, so there's no reason not > to do that (and to be fair, I didn't look around for possible uses of > the pattern; it's just not one I think of as common off the top of my > head). > >> That and the even more common case I mentioned upthread of wanting to >> ferry around the truncated version of some char *, but still wanting to >> account for the original for an eventual free(). >> >> But yes, if you want to account for freeing that data *and* have util >> set to something else you'll need to have e.g. your own wrapper struct >> and your own string_list_clear_func() callback. > > But stuffing it into the util field of string_list really feels like a > stretch, and something that would make existing string_list use painful. > There are tons of cases where util points to some totally unrelated (in > terms of memory ownership) item. I'd venture to say most cases where > string_list_clear() is called without free_util would count here. For what it's worth I've got some WIP code that's part of my daily build where I did end up going through all those callers, as part of general string_list_clear() improvements mentioned offhand in https://lore.kernel.org/git/87bl6kq631.fsf@xxxxxxxxxxxxxxxxxxx/ This is just from fuzzy memory & I can't recall the specifics (and haven't combed through that WIP code now), but it's something like that in the ~100 uses of string_list in our codebase 60-70% are the simple case where the "strdup_strings" and string_list_clear() is enough, maybe another 10-20% have a "util" field they manage or not, 5%-ish have a simple string_list_clear_func(). It was just 2-3 cases that leaked memory due to skipping a prefix and sticking it in the list, and maybe another 1-2 where the void* to a struct containing the string stuck into the string slot was something we could use. So it's not "common" in the sense of absolute numbers, but I did run into a handful of them, and having them handled by having the string_list take an arbitrary "util" was something I found neat. I should probably have said "well known" (as in "well known technique"), "idiomatic" or something... >> > I don't think most interfaces take a string_list_item now, so wouldn't >> > they similarly need to be changed? Though the point is that all of these >> > degrade to a regular C-string, so when you are just passing the value >> > (and not ownership), you would just dereference at that point. >> >> Sure, just like things would need to be changed to handle your proposed >> "struct def_string". >> >> By piggy-backing on an already used struct in our codebase we can get a >> lot of that memory management pretty much for free without much >> churn. >> >> If you squint and pretend that "struct string_list_item" isn't called >> something to do with that particular collections API (but it would make >> use of it) then we've already set up most of the scaffolding and >> management for this. > > It's that squinting that bothers me. Sure, it's _kinda_ similar. And I > don't have any problem with some kind of struct that says "this is a > string, and when you are done with it, this is how you free it". And I > don't have any problem with building the "dup" version of string_list > with that struct as a primitive. But it seems to me to be orthogonal > from the "util" pointer of a string_list, which is about creating a > mapping from the string to some other thing (which may or may not > contain the string, and may or may not be owned). The "util" is whatever the user makes it. We could add a "pointer_to_free" to every container type to solve this more cleanly/generally at the API level, but just handing the problem to the user seems better to me. I.e. an API like string_list has convenience functions for freeing all the "util", if you only need it for memory tracking use it as-is, if you need a "real util" *and* such tracking just create a 2-member wrapper struct yourself & use that. > TBH, I have always found the "util" field of string_list a bit ugly (and > really most of string_list). I think most cases would be better off with > a different data structure (a set or a hash table), but we didn't have > convenient versions of those for a long time. I don't mind seeing > conversions of string_list to other data structures. But that seems to > be working against using string_list's string struct in more places. If we followed my idle musings we'd be using string_list_item in more places, not necessarily string_list, and would rename s/string_list_item/string_and_util/ or something. One way to look at this problem is that we're pretty close to just re-inventing the sort of generalized refcounted container type that some programming languages carry around. E.g. Perl has a "struct SV*" that a $string maps to, but also hash and array values etc. Those languages usually have a "refcount" or whatever, but since we're using this in native C and it's usually (or at least should be) clear who owns the memory just having something to point free() at will do. I'm just saying that if we're going halfway there it would be unfortunate if we'd end up with a "struct def_string" which wouldn't handle this "borrowing a string from a struct" case. Or maybe we should just use "struct strbuf" and do copying in even more places...