Re: [PATCH 1/2] cat-file: force flush of stdout on empty string

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Nov 07 2021, John Cai wrote:

> O Sat, Nov 06, 2021 at 05:01:10AM +0100, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Fri, Nov 05 2021, Junio C Hamano wrote:
>> 
>> > "John Cai via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
>> >
>> >> @@ -405,6 +405,11 @@ static void batch_one_object(const char *obj_name,
>> >>  	int flags = opt->follow_symlinks ? GET_OID_FOLLOW_SYMLINKS : 0;
>> >>  	enum get_oid_result result;
>> >>  
>> >> +	if (opt->buffer_output && obj_name[0] == '\0') {
>> >> +		fflush(stdout);
>> >> +		return;
>> >> +	}
>> >> +
>> >
>> > This might work in practice, but it a bad design taste to add this
>> > change here.  The function is designed to take an object name
>> > string, and it even prepares a flag variable needed to make a call
>> > to turn that object name into object data.  We do not need to
>> > contaminate the interface with "usually this takes an object name,
>> > but there are these other special cases ...".  The higher in the
>> > callchain we place special cases, the better the lower level
>> > functions become, as that allows them to concentrate on doing one
>> > single thing well.
>> >
>> >>  	result = get_oid_with_context(the_repository, obj_name,
>> >>  				      flags, &data->oid, &ctx);
>> >>  	if (result != FOUND) {
>> >> @@ -609,7 +614,11 @@ static int batch_objects(struct batch_options *opt)
>> >>  			data.rest = p;
>> >>  		}
>> >>  
>> >> -		batch_one_object(input.buf, &output, opt, &data);
>> >> +		 /*
>> >> +		  * When in buffer mode and input.buf is an empty string,
>> >> +		  * flush to stdout.
>> >> +		  */
>> >
>> > Checking "do we have the flush instruction (in which case we'd do
>> > the flush here), or do we have textual name of an object (in which
>> > case we'd call batch_one_object())?" here would be far cleaner and
>> > results in an easier-to-explain code.  With a cleanly written code
>> > to do so, it probably does not even need a new comment here.
>> >
>> > This brings up another issue.  Is "flushing" the *ONLY* special
>> > thing we would ever do in this codepath in the future?  I doubt so.
>> > Squatting on an "empty string" is a selfish design that hurts those
>> > who will come after you in the future, as they need to find other
>> > ways to ask for a "special thing".
>> >
>> > If we are inventing a special syntax that allows us to spell
>> > commands that are distinguishable from a validly-spelled object name
>> > to cause something special (like "flushing the output stream"),
>> > perhaps we want to use a bit more extensible and explicit syntax and
>> > use it from day one?
>> >
>> > For example, if no string that begins with three dots can ever be a
>> > valid way to spell an object name, perhaps "...flush" might be a
>> > better "please do this special thing" syntax than an empty string.
>> > It is easily extensible (the next special thing can follow suit to
>> > say "...$verb" to tell the machinery to $verb the input).  When we
>> > compare between an empty string and "...flush", the latter clearly
>> > is more descriptive, too.
>> >
>> > Note that I offhand do not know if "a valid string that name an
>> > object would never begin with three-dot" is true.  Please check
>> > if that is true if you choose to use it, or you can find and use
>> > another convention that allows us to clearly distinguish the
>> > "special" instruction and object names.
>> 
>> I had much the same thought, this is a useful feature, but let's not
>> squat on the one bit of open syntax we have.
>> 
>> John: I think a better direction here is to add a mode to cat-file to
>> emulate what "git update-ref --stdin" supports. Here's a demo of that
>> (also quoted below):
>> https://github.com/git/git/commit/7794f6cfdbdca0dd6bab0dea16193ebf018b86a9
>> 
>> That's on top of some general UI improvements to cat-file I've got
>> locally:
>> https://github.com/git/git/compare/master...avar:avar/cat-file-usage-and-options-handling
>> 
>> That WIP patch on top follows below, of course it's a *lot* more initial
>> scaffolding, but I think once we get past that initial step it's a much
>> better path forward. As noted the code is also almost entirely
>> copy/pasted from update-ref.c, and perhaps some of the shared parts
>> could be moved to some library both could use.
>> 
>> I couldn't think of a better name than --stdin-cmd, suggestions most
>> welcome.
>> 
>> From 7794f6cfdbdca0dd6bab0dea16193ebf018b86a9 Mon Sep 17 00:00:00 2001
>> Message-Id: <patch-1.1-7794f6cfdbd-20211106T040307Z-avarab@xxxxxxxxx>
>> From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
>>  <avarab@xxxxxxxxx>
>> Date: Sat, 6 Nov 2021 04:54:04 +0100
>> Subject: [PATCH] WIP cat-file: add a --stdin-cmd mode
>> MIME-Version: 1.0
>> Content-Type: text/plain; charset=UTF-8
>> Content-Transfer-Encoding: 8bit
>> 
>> This WIP patch is mostly stealing code from builtin/update-ref.c and
>> implementing the same sort of prefixed command-mode that it
>> supports. I.e. in addition to --batch now supporting:
>> 
>>     <object> LF
>> 
>> It'll support with --stdin-cmd, with and without -z, respectively:
>> 
>>     object <object> NL
>>     object <object> NUL
>> 
>> The plus being that we can now implement additional commands. Let's
>> start that by scratching the itch John Cai wanted to address in [1]
>> and implement a (with and without -z):
>> 
>>     fflush NL
>>     fflush NUL
>> 
>> That command simply calls fflush(stdout), which could be done as an
>> emergent effect before by feeding the input a "NL".
>> 
>> I think this will be useful for other things, e.g. I've observed in
>> the past that a not-trivial part of "cat-file --batch" time is spent
>> on parsing its <object> argument and seeing if it's a revision, ref
>> etc.
>> 
>> So we could e.g. add a command that only accepts a full-length 40
>> character SHA-1, or switch the --format output mid-request etc.
>> 
>> 1. https://lore.kernel.org/git/pull.1124.git.git.1636149400.gitgitgadget@xxxxxxxxx/
>> 
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
>> ---
>>  builtin/cat-file.c | 116 ++++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 115 insertions(+), 1 deletion(-)
>> 
>> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
>> index b76f2a00046..afdb976c6e7 100644
>> --- a/builtin/cat-file.c
>> +++ b/builtin/cat-file.c
>> @@ -26,7 +26,10 @@ struct batch_options {
>>  	int unordered;
>>  	int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
>>  	const char *format;
>> +	int stdin_cmd;
>> +	int end_null;
>>  };
>> +static char line_termination = '\n';
>>  
>>  static const char *force_path;
>>  
>> @@ -507,6 +510,106 @@ static int batch_unordered_packed(const struct object_id *oid,
>>  				      data);
>>  }
>>  
>> +enum batch_state {
>> +	/* Non-transactional state open for commands. */
>> +	BATCH_STATE_OPEN,
>> +};
>> +
>> +static void parse_cmd_object(struct batch_options *opt,
>> +			     const char *next, const char *end,
>> +			     struct strbuf *output,
>> +			     struct expand_data *data)
>> +{
>> +	size_t len = end - next - 1;
>> +	char *p = (char *)next;
>> +	char old = p[len];
>> +
>> +	p[len] = '\0';
>> +	batch_one_object(next, output, opt, data);
>> +	p[len] = old;
>> +}
>> +
>> +static void parse_cmd_fflush(struct batch_options *opt,
>> +			     const char *next, const char *end,
>> +			     struct strbuf *output,
>> +			     struct expand_data *data)
>> +{
>> +	if (*next != line_termination)
>> +		die("fflush: extra input: %s", next);
>> +	fflush(stdout);
>> +}
>> +
>> +static const struct parse_cmd {
>> +	const char *prefix;
>> +	void (*fn)(struct batch_options *, const char *, const char *, struct strbuf *, struct expand_data *);
>> +	unsigned args;
>> +	enum batch_state state;
>> +} command[] = {
>> +	{ "object", parse_cmd_object, 1, BATCH_STATE_OPEN },
>> +	{ "fflush", parse_cmd_fflush, 0, BATCH_STATE_OPEN },
>> +};
> I think overall this approach is cleaner and makes sense. My only
> question is, are there more commands in the future that will need some
> special command syntax? Just wondering whether YAGNI applies here.

An obvious addition is to at least add the ability to set the various
options on the fly, i.e. now you need to use --batch-check, and then
kill it and restart if you'd like the content with --batch, ditto for
--textconv.

E.g. the gitaly backend for gitlab.com keeps two cat-filfe processes
around just to flip-flop between those two, sometimes you want the
content, sometimes you're just checking if the object exists.

I'd also like to add something to expose the likes of -e and -t
directly, i.e. even with --batch-check you often want to just check
existence, but get the size too, you could supply a format, but like the
above you sometimes want the size or whatever, and killing/starting a
new process just for that is a hassle...




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux