RE: [PATCH v2 2/2] sub-process: refactor the filter process code into a reusable module

Ben Peart <Ben.Peart@xxxxxxxxxxxxx> · Mon, 27 Mar 2017 23:54:36 +0000

> -----Original Message-----
> From: Jonathan Tan [mailto:jonathantanmy@xxxxxxxxxx]
> Sent: Monday, March 27, 2017 3:00 PM
> To: Ben Peart <peartben@xxxxxxxxx>; git@xxxxxxxxxxxxxxx
> Cc: Ben Peart <Ben.Peart@xxxxxxxxxxxxx>
> Subject: Re: [PATCH v2 2/2] sub-process: refactor the filter process code into
> a reusable module
> 
> On 03/24/2017 08:27 AM, Ben Peart wrote:
> > Refactor the filter.<driver>.process code into a separate sub-process
> > module that can be used to reduce the cost of starting up a
> > sub-process for multiple commands.  It does this by keeping the
> > external process running and processing all commands by communicating
> > over standard input and standard output using the packet format (pkt-line)
> based protocol.
> > Full documentation is in Documentation/technical/api-sub-process.txt.
> 
> Thanks - this looks like something useful to have.

Thanks for the review and feedback.

> 
> When you create a "struct subprocess_entry" to be entered into the system,
> it is not a true "struct subprocess_entry" - it is a "struct subprocess_entry"
> plus some extra variables at the end. Since the sub-process hashmap is
> keyed solely on the command, what happens if another component uses the
> same trick (but with different extra
> variables) when using a sub-process with the same command?

Having the command be the unique key is sufficient because it gets executed as a process by run_command and there can't be multiple different processes by the same name. 

> 
> I can think of at least two ways to solve this: (i) each component can have its
> own sub-process hashmap, or (ii) add a component key to the hashmap. (i)
> seems more elegant to me, but I'm not sure what the code will look like.
> 
> Also, I saw some minor code improvements possible (e.g. using "starts_with"
> when you're checking for the "status=<foo>" line) but I'll comment on those
> and look into the code more thoroughly once the questions in this e-mail are
> resolved.
> 
> > diff --git a/sub-process.h b/sub-process.h new file mode 100644 index
> > 0000000000..d1492f476d
> > --- /dev/null
> > +++ b/sub-process.h
> > @@ -0,0 +1,46 @@
> > +#ifndef SUBPROCESS_H
> > +#define SUBPROCESS_H
> > +
> > +#include "git-compat-util.h"
> > +#include "hashmap.h"
> > +#include "run-command.h"
> > +
> > +/*
> > + * Generic implementation of background process infrastructure.
> > + * See Documentation/technical/api-background-process.txt.
> > + */
> > +
> > + /* data structures */
> > +
> > +struct subprocess_entry {
> > +	struct hashmap_entry ent; /* must be the first member! */
> > +	struct child_process process;
> > +	const char *cmd;
> > +};
> 
> I notice from the documentation (and from "subprocess_get_child_process"
> below) that this is meant to be opaque, but I think this can be non-opaque
> (like "run-command").
> 
> Also, I would prefer adding a "util" pointer here instead of using it as an
> embedded struct. There is no clue here that it is embeddable or meant to be
> embedded.
> 

The structure is intentionally opaque to provide the benefits of encapsulation.  Obviously, the "C" language doesn't provide any enforcement of that design principal but we do what we can.  

The embedded struct is following the same design pattern as elsewhere in git (hashmap for example) simply for consistency.

> > +
> > +/* subprocess functions */
> > +
> > +typedef int(*subprocess_start_fn)(struct subprocess_entry *entry);
> > +int subprocess_start(struct subprocess_entry *entry, const char *cmd,
> > +		subprocess_start_fn startfn);
> 
> I'm not sure if it is useful to take a callback here - I think the caller of this
> function can just run whatever it wants after a successful subprocess_start.

The purpose of doing the subprocess specific initialization via a callback is so that if it encounters an error (for example, it can't negotiate a common interface version) the subprocess_start function can detect that and ensure the hashmap does not contain the invalid/unusable subprocess. 

> 
> Alternatively, if you add the "util" pointer (as I described above), then it
> makes sense to add a subprocess_get_or_start() function (and now it makes
> sense to take the callback). This way, the data structure will own, create, and
> destroy all the "struct subprocess_entry" that it needs, creating a nice
> separation of concerns.
> 
> > +
> > +void subprocess_stop(struct subprocess_entry *entry);
> 
> (continued from above) And it would be clear that this would free
> "entry", for example.
> 
> > +
> > +struct subprocess_entry *subprocess_find_entry(const char *cmd);
> > +
> > +/* subprocess helper functions */
> > +
> > +static inline struct child_process *subprocess_get_child_process(
> > +		struct subprocess_entry *entry)
> > +{
> > +	return &entry->process;
> > +}
> > +
> > +/*
> > + * Helper function that will read packets looking for "status=<foo>"
> > + * key/value pairs and return the value from the last "status" packet
> > + */
> > +
> > +int subprocess_read_status(int fd, struct strbuf *status);
> > +
> > +#endif
> >