Re: [PATCH v2 00/10] send-email: various optimizations to speed up by >2x

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Sat, 29 May 2021 10:19:07 +0200

On Fri, May 28 2021, Felipe Contreras wrote:

> Ævar Arnfjörð Bjarmason wrote:
>> Returning a flattened list is idiomatic in Perl, it means that a caller
>> can do any of:
>> 
>>     # I only care about the last value for a key, or only about
>>     # existence checks
>>     my %hash = func();
>
> I was staying on the sideline because I don't know what's idiomatic in
> Perl, but Perl and Ruby share a lot in common (one could say Perl is the
> grandfather of Ruby), and I do know very well what's idiomatic in Ruby.
>
> In perl you can do $ENV{'USER'}, and:
>
>   while (my ($k, $v) = each %ENV) {
>     print "$k = $v\n";
>   }
>
> Obviously it's idiomatic to use hashes this way [1].

For what it's worth idiomatic/good idea and "has an example in the perl
documentation" unfortunately aren't always aligned. A lot of experienced
Perl programmers avoid each() like the plague:
http://blogs.perl.org/users/rurban/2014/04/do-not-use-each.html

> It was a waste for Git::config_regexp to not do the sensible thing here.

FWIW we're commenting on a v2 of a series that's at v5 now, and doesn't
use config_regexp() at all, the relevant code is inlined in
git-send-email.perl now.

> You can do exactly the same in Ruby: ENV['USER']
>
>   ENV.each { |k, v| print "#{k} = #{v}\n" }
>
> And the way I would parse these configurations in Ruby is something like:

>   c = `git config -l -z`.split("\0").map { |e| e.split("\n") }.to_h
>   c['sendemail.smtpserver']
>
> And this just gave me an idea...

I'd probably do it that way in Ruby, but not in Perl.

Things that superficially look the same in two languages can have
completely different behaviors, a "hash" isn't a single type of data
structure in these programming languages.

In particular Ruby doesn't have hshes in the Perl sense of the word, it
has an ordered key-value pair structure (IIRC under the hood they're
hashes + a double linked list).

Thus you can use it for things like parsing a key=>value text file where
the key is unique and the order is important.

In Perl hashes are only meant for key-value lookup, they are not
ordered, and are actually actively randomly ordered for security
reasons. In any modern version inserting a new key will have an avalance
effect of completely changing the order. It's not even stable across
invocations:

    $ perl -wE 'my %h; for ("a".."z") { $h{$_} = $_; say keys %h }'
    a
    ab
    bca
    dcba
    daebc
    cbaedf
    aecbfdg
    dgfcbaeh
    [...]

The other important distinction (but I'm not so sure about Ruby here) is
that Perl doesn't have any way to pass a hash or any other structure to
another function, everything is flattened and pushed onto the stack.

To pass a "hash" you're not passing the hash, but a "flattened" pointer
to it on the stack.

Thus passing and making use of these flattened values is idiomatic in
Perl in a way that doesn't exist in a lot of other languages. In some
other languages a function has to choose whether it's returning an array
or a hash, in Perl you can just push the "flattened" items that make up
the array on the stack, and have the caller decide if they're pushing
those stack items into an array, or to a hash if they expect it to be
meaningful as key-value pairs.

In the context of Git's config format doing that is the perfect fit for
config values, our config values *are* ordered, but they are also
sort-of hashes, but whether it's "all values" or "last value wins" (or
anything else, that's just the common ones) depends on the key/user.

So by having a list of key-value pairs on the stack you can choose to
put it into an array if you don't want to lose information, or put it
into a hash if all you care about is "last key wins", or "I'm going to
check for key existence".

I think that in many other languages that wouldn't make any sense, and
you'd always return a structure like:

    [
         key => [zero or more values],
        [...]
    ]

Or whatever, the caller can also unambiguously interpret those, but
unlike Perl you'd need to write something to explicitly iterate the
returned value (or a helper) to get it into a hash or a "flattened"
array. In Perl it's trivial due to the "everything is on the stack"
semantics.

Anyway, all that being said the part we're talking about as a trivial
part of this larger series. I'd much prefer to have it just land as
"good enough" at this point. It works, we can always tweak it further
later if there's a need to do that.