On Tue, Feb 01, 2022 at 02:27:54PM -0500, John Cai wrote: > On 1 Feb 2022, at 12:52, Taylor Blau wrote: > > I'm not sure that I've seen a response along the lines of "we need to > > control when the output stream is flushed in order to do ..." yet, but I > > would be interested to see one before moving too much further ahead of > > where we already are. > > This would be useful when there is another process A interacting with > a long running git cat-file process B that is retrieving object > information from the odb interactively but also wants to use --buffer > mode. Let me try and repeat my understanding of what you said to make sure that I fully grok the use-case you have in mind. You have a repository and want to have a long-running `git cat-file` process that can serve multiple requests. Because the processes which interact with your long-running `cat-file` may ask for many objects, you don't want to flush the output buffer after each object, and so would ideally like to use `--buffer`. But that doesn't quite work, since the `cat-file` process may not have decided to flush its output buffer even when process A is about to go away. I wonder about the viability of accomplishing this via a signal handler, i.e., that `cat-file` would call fflush(2) whenever it receives e.g., SIGUSR1. A couple of possible downsides: - SIGUSR1 doesn't exist on Windows AFAIK. - There are definitely going to be synchrony issues to contend with. What happens if we receive our signal while writing to the output stream? I think you would just need to mark a variable that indicates we should flush after finishing serving the current request, but I haven't thought too hard about it. So maybe a signal isn't the way to go. But I don't think `--stdin-cmd` is the simplest approach either. At the very least, I don't totally understand your plan after implementing a flush command. You mention that it would be nice to implement other commands, but I'm not totally convinced by your examples[1]. I wonder if we could strike a middle ground, which might look like `git cat-file --batch --buffer`, and just feeding it something which we know for certain isn't an object identifier. In other words, what if we did something as simple as: --- >8 --- diff --git a/builtin/cat-file.c b/builtin/cat-file.c index d94050e6c1..bae162fc18 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -595,6 +595,11 @@ static int batch_objects(struct batch_options *opt) warn_on_object_refname_ambiguity = 0; while (strbuf_getline(&input, stdin) != EOF) { + if (!strcmp("<flush>", input.buf)) { + fflush(stdout); + continue; + } + if (data.split_on_whitespace) { /* * Split at first whitespace, tying off the beginning --- 8< --- On the other hand, something even hackier than the above is that we flush stdout whenever we get a request to print an object which could not be found. So if you feed a single "\n" to your `cat-file` process, you'll get " missing" on its output, and the buffer will immediately be flushed. I'm not sure that I'd recommend relying on that behavior exactly, but if you're looking for a short-term solution, it might work ;). Thanks, Taylor [1]: One that comes to mind is changing the output format mid-stream. But how often does it really make sense to change the output format? I can understand wanting to flush at the end asking cat-file for a bunch of objects, but I don't see how you would want to change the output format often enough that shaving off Git's negligible startup cost is worthwhile (or couldn't be accomplished by just spawning another cat-file process and using that).