Re: [PATCH v2 00/10] send-email: various optimizations to speed up by >2x

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Fri, 21 May 2021 08:23:15 +0200

On Fri, May 21 2021, Junio C Hamano wrote:

> Jeff King <peff@xxxxxxxx> writes:
>
>> I like all of this, except for the change in the interface of
>> Git::config_regexp(). You mention that it's new-ish, and probably not in
>> wide use. And I agree that's probably true. But it feels like violating
>> a principle of not breaking APIs, and we should stick to that principle
>> and not bend it for "well, it's not that old an API".
>>
>> I'd find it more compelling if it the existing interface was broken or
>> hard to avoid changing. But couldn't we just add a new function with the
>> extra info (config_regexp_with_values() or something)?
>
> We seem to use it in-tree only once, but we have no control over
> what out-of-tree users have been doing.

I thought people might have bought my argument in v1 that we just
document these as "public" as part of some copy/pasting in Git.pm, but
sure, I'll re-roll and make this a new function or something...

> I do like your proposed name for the fact that it has _with_values
> in it; the original gave us only a list of configuration variable
> names, and now we are getting name-value tuples.
>
> BUT
>
> I am not sure if I like the new interface, and I do not know if the
> the name "config_with_values" is a good match for what it does.
>
> Namely, when a function is described as:
>
>     =item config_regexp ( RE )
>
>     Retrieve the list of configuration key names matching the regular
>     expression C<RE>. The return value is an ARRAY of key-value pairs.
>
> I would expect it to return ([key1, value1], [key2, value2], ...),
> but the implementation returns (key1, value1, key2, value2, ...),
> i.e. a flattened list, if I am not mistaken.

...

>     my @cmd = ('config', '--null', '--get-regexp', $regex);
>     unshift @cmd, $self if $self;
>     my $data = command(@cmd);
>     my (@kv) = map { split /\n/, $_, 2 } split /\0/, $data;
>     return @kv;
>
> We get NUL terminated records, so we first split the entire output
> at NULs (to get a list of key-value records).  Each key-value record
> has the key followed by LF followed by value, so we split it at the
> first LF (i.e. a value with an embedded LF would behave correctly)
> to extract key and value out of it.  But the resulting list is a
> flat list with alternating key and value.
>
> The side that works on the returned value of courese knows that it
> is getting a flattened list and acts accordingly:
>
>     my @kv = Git::config_regexp("^sende?mail[.]");
>     while (my ($k, $v) = splice @kv, 0, 2) {
>         push @{$known_config_keys{$k}} => $v;
>     }
>
> Perhaps it may be more performant than a more obvious and
> straight-forward "list of tuples", i.e.
>
>     return map { [split /\n/, $_, 2] } split /\0/, $data;
>
>     my @kv = Git::config_regexp("^sende?mail[.]");
>     for my $tuple (@kv) {
>         push @{$known_config_keys{$tuple->[0]}}, $tuple->[1];
>     }
>
> but somehow the flattened list felt unnatural at least to a naïve
> reader like me.

The "performant" really doesn't matter here, we're comparing shelling
out to getting a small number of config keys back. So I wasn't trying to
optimize this.

Returning a flattened list is idiomatic in Perl, it means that a caller
can do any of:

    # I only care about the last value for a key, or only about
    # existence checks
    my %hash = func();

Or:

    # I want all key-values to iterate over
    my @kv = func();

Returning touples like this makes that less convenient for both, who'll
need to do more work to unpack them.

For what it's worth In Perl "return a list" means "flattened list", the
term "flattened list" I think comes from other languages. You'd call
what you're suggesting a list of arrays, or (if a top-level reference)
an array of arrays, AoA for short, AoH for array (ref) of hashes etc.