On Fri, Feb 17, 2023 at 2:57 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > Emily Shaffer <nasamuffin@xxxxxxxxxx> writes: > > > Basically, if this effort turns out not to be fruitful as a whole, I'd > > like for us to still have left a positive impact on the codebase. > > ... > > So what's next? Naturally, I'm looking forward to a spirited > > discussion about this topic - I'd like to know which concerns haven't > > been addressed and figure out whether we can find a way around them, > > and generally build awareness of this effort with the community. > > On of the gravest concerns is that the devil is in the details. > > For example, "die() is inconvenient to callers, let's propagate > errors up the callchain" is an easy thing to say, but it would take > much more than "let's propagate errors up" to libify something like > check_connected() to do the same thing without spawning a separate > process that is expected to exit with failure. Because the error propagation path is complicated, you mean? Or because the cleanup is painful? I wonder about this idea of spawning a worker thread that can terminate itself, though. Is it a bad idea? Is it a hacky way of pretending that we have exceptions? I guess if we have a thread then we still have the same concerns about memory management (which we don't have if we use a child process). (I'll reply to demerphq's mail in detail, but it seems like the hardest part of this is memory cleanup, no?) In other cases, we might want to perform some work that can be sped up by using more threads; how do we want to expose that functionality to the caller? Do we want to manage our own threads, or do we want to pass off orchestrating those worker threads to the caller (who theoretically might have a faster way to manage them, like GPU execution or distributed execution or something, or who might be using their own thread pool manager)? > > It is not clear if we can start small, work on a subset of the > things and still reap the benefit of libification. Is there an > existing example that we have successfully modularlized the API into > one subsystem? Offhand, I suspect that the refs API with its two > implementations may be reasonably close, but is the inteface into > that subsystem the granularity of the library interface you guys > have in mind? I think many of our internal APIs, especially the lower level ones, are actually quite well modularized, or close enough to it that you can't really tell they aren't. run-command.h and config.h come to mind. The ones that aren't, I tend to think are frustrating to work with anyways - is it reasonable to consider, for example, further cleanup of cache.h as part of this effort? Is it reasonable to rework an ugly circular dependency between two headers as a prerequisite to doing library work around one of them? I had a look at the refs API documentation but it seems that we don't actually have a way for the code to use reftable. Is that what you meant by the two implementations of refs API, or am I missing something else? Anyway, abstracting at the "which backend do I want to use" layer seems absolutely appropriate to me, if we're discussing places where Git can use an alternative implementation. (For example, this means it's also easy for Git to use some random NoSQL table as a ref store, if that's what the caller wants.) For the most part refs.h seems like it has things I would want to expose to external callers (or that I would want to reimplement as a library author). - Emily