On Thu, Sep 30, 2021 at 7:39 AM Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: > > On Tue, Sep 28 2021, Elijah Newren wrote: > > > On Tue, Sep 28, 2021 at 1:00 AM Ævar Arnfjörð Bjarmason > > <avarab@xxxxxxxxx> wrote: > >> > >> On Tue, Aug 31 2021, Elijah Newren via GitGitGadget wrote: > >> > >> I commented just now on how this API is duplicated between here & > >> another in-flight series in > >> https://lore.kernel.org/git/87v92lxhh4.fsf@xxxxxxxxxxxxxxxxxxx/; Just > >> comments on this patch here: > >> > >> > diff --git a/tmp-objdir.c b/tmp-objdir.c > >> > index b8d880e3626..9f75a75d1c0 100644 > >> > --- a/tmp-objdir.c > >> > +++ b/tmp-objdir.c > >> > @@ -288,7 +288,36 @@ const char **tmp_objdir_env(const struct tmp_objdir *t) > >> > return t->env.v; > >> > } > >> > > >> > +const char *tmp_objdir_path(struct tmp_objdir *t) > >> > +{ > >> > + return t->path.buf; > >> > +} > >> > >> Not your fault & this pre-dates your patch, but FWIW I prefer our APIs > >> that don't have these "hidden struct" shenanigans (like say "struct > >> string_list") so a caller could just access this, we can just declare it > >> "const" appropriately. > >> > >> We're also all in-tree friends here, so having various accessors for no > >> reason other than to access struct members seems a bit too much. > > > > That's a fair point, but just to play counterpoint for a minute... > > > > FWIW, I dislike when our public facing APIs are polluted with all > > kinds of internal details. merge-recursive being a case in point. > > When writing merge-ort, although I had a struct with public fields, > > that struct also contained an opaque struct (pointer) within it to > > hide several private fields. (I would have liked to hide or remove a > > few more fields, but couldn't do so while the merge_recursive_options > > struct was shared between both merge-ort and merge-recursive.) > > > > That said, I agree it can certainly be overdone, and tmp_objdir is > > pretty simple. However, sometimes even in simple cases I like when > > folks make use of an opaque struct because it means folks put some > > effort into thinking more about the API that should be exposed. > > That's something we as a project have often overlooked in the past, as > > evidenced both by our one-shot usage mentality, and the existence of > > external projects like libgit2 attempting to correct this design > > shortcoming. I'd like git to move more towards being structured as a > > reusable library as well as a useful command-line tool. > > [This is somewhat of a continuation of > https://lore.kernel.org/git/87lf3etaih.fsf@xxxxxxxxxxxxxxxxxxx/]. > > I like our APIs being usable, but right now we're not promising any > maintenance of APIs that work the same as git v2.30.0 or whatever. > > I think some external users directly link to libgit.a, but they're > tracking whatever churn we have from release to release, just like > someone maintaining out-of-tree linux kernel drivers. > > For maintenance of git.git itself I think it's going to stay like that > for any foreseeable future[1]. > > And that's before we even get into the practicalities of git's license > likely not winning over many (if any) libgit2 users. I think those users > are if anything migrating from libgit2 to git's plumbing CLI, or in some > cases folding what they needed into git.git itself (or just staying with > libgit2). > > So assuming all of that I really don't see the point in general of > having a "foo" field that has the equivalent of get_foo() and set_foo() > accessors in the context of git.git. > > That's something that's really useful if you're maintaining API and ABI > compatibility, but for us it's usually just verbosity. All the users are > in-tree, so just add "const" as appropriate, or perhaps we should give > the field a "priv" name or whatever. It wasn't just a get or set accessors, and there was more involved than just adding const: * There was a typechange from struct strbuf -> const char *; strbuf != char * * Having an API that _enforces_ accesses to be const provides more safety. Relying on callers to remember to declare their variable access as const provides virtually none. But more importantly than either of the above, tmp_objdir was locked down because it was too easy to make big mistakes with, as Peff pointed out. That meant that folks would only use it through the API, and if folks tried to do unusual things, they'd do so by adding new API, which is easier to flag in code reviews and make sure the right people get cc'ed than if folks just reach into some internals from some other file. And the new API that I added resulted in a discussion that pointed out I didn't need to access that field at all and should instead create and delete the tmp_objdir after every merge. > Obviously not everything should be done that way, e.g. being able to > have trace2 start/end points for something you'd otherwise push/clear > from a "struct string_list" or whatever directly *is* usable, so it's > not black and white. > > I'm just saying thaht using convoluted patterns *just* to get in the way > of some hypothetical code that's going to mess with your internals can > make things harder in other areas. E.g. the init/release pattern via > macros I mentioned in the linked E-Mail above. Fair points, but I just don't think that's relevant to the context here; i.e. these aren't convoluted patterns *just* to get in the way. Further, I don't see how this would adversely affect the *_INIT macros you mentioned at all. tmp_objdir is an opaque struct so you can only declare a pointer to it. Pointers being initialized to NULL is handled quite nicely by the *_INIT macros. > > [...] > >> In this particular case I hadn't understood on a previous reading of > >> tmp-objdir.c why this "path" isn't a statically allocated "char > >> path[PATH_MAX]", or a least we that hardcoded "1024" should be a > >> PATH_MAX, as it is in some other cases. > > > > Actually, PATH_MAX shouldn't be used: > > > > https://lore.kernel.org/git/7d1237c7-5a83-d766-7d93-5f0d59166067@xxxxxx/ > > https://lore.kernel.org/git/20180720180427.GE22486@xxxxxxxxxxxxxxxxxxxxx/ > > Thanks. I didn't know about that. I've fixed (some of that) up in some > semi-related WIP work I've got. > > 1. Aside: I *do* think it would be really useful for us to expose a > stable C API eventually, using some of the same license > workarounds/shenanigans (depending on your view) as the Linux kernel > did. I.e. to say that "we're GPLv2, but this small API is the > equivalent of our syscall interface". > > Rather than that API being something that would look anything like > internal git.git users it would basically be a run_command() + Perl's > unpack/pack + output formatter on steroids. > > I.e. we'd be able to invoke say a 'git cat-file --batch' in the same > process as the caller by essentially calling cmd_cat_file() > directly. Just with some callback replacements for "provide this on > stdin" and "this line accepts stdout" without actually involving > different processes or I/O. > > You'd then have the equivalent of a --format output that would just > so happen to be in your OS's / C ABI's understood struct > format. Similar to "perldoc -f pack". > > The maintenance burden of such an API would be trivial. It would > basically be a handful of functions just to setup input/output > callbacks. > > Most of the interface would be a rather convoluted construction of > command-line parameters (like our own run_command() use), not ideal, > but most users are after structured data and eliminating process > overhead, and they'd get that. > > We would need to replace fprintf() and the like with some structured > formatting mechanism, but that would be useful for non-API users as > new commands learning a shiny new --format parameter. You could even > use such a --format to have a non-C-API user get the same struct > output, perhaps for direct mapping into memory, say if you didn't > trust the built-in/command in question to not leak memory yet. > > I don't know if this has been suggested before, just my 0.02. It's > also partially why I'm interested in making even-built-ins not leak > memory with some of my parallel SANITIZE=leak work. > > Once you've got that you can usually call the cmd_whatever() in a > loop without much effort, and once you've got that, well, see above > :) This sounds really cool.