Re: [PATCH v5 10/12] Add a base implementation of SHA-256 support

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Sat, 10 Nov 2018 16:52:37 +0100

On Wed, Nov 07 2018, brian m. carlson wrote:

> On Mon, Nov 05, 2018 at 12:39:14PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> On Sun, Nov 04 2018, brian m. carlson wrote:
>> > +	{
>> > +		"sha256",
>> > +		/* "s256", big-endian */
>>
>> The existing entry/comment for sha1 is:
>>
>> 		"sha1",
>> 		/* "sha1", big-endian */
>>
>> So why the sha256/s256 difference in the code/comment? Wondering if I'm
>> missing something and we're using "s256" for something.
>
> Ah, good question.  The comment refers to the format_id field which
> follows this comment.  The value is the big-endian representation of
> "s256" as a 32-bit value.  I picked that over "sha2" to avoid confusion
> in case we add SHA-512 in the future, since that's also an SHA-2
> algorithm.
>
> Config files and command-line interfaces will use "sha1" or "sha256",
> and binary formats will use those 32-bit values ("sha1" or "s256").

Okey.

>> >  const char *empty_tree_oid_hex(void)
>> > diff --git a/sha256/block/sha256.c b/sha256/block/sha256.c
>> > [...]
>>
>> I had a question before about whether we see ourselves perma-forking
>> this implementation based off libtomcrypt, as I recall you said yes.
>
> Yes.
>
>> Still, I think it would be better to introduce this in at least two-four
>> commits where the upstream code is added as-is, then trimmed down to
>> size, then adapted to our coding style, and finally we add our own
>> utility functions.
>
> At this point, the only code that's actually used from libtomcrypt is
> the block transform.  The upstream code is split over multiple files in
> multiple directories and won't compile in our codebase without many
> files and a lot of work, so I don't feel good about either including
> code that doesn't compile or including large numbers of files that don't
> meet our coding standards (and that may still not compile because of
> platform issues).
>
>> It'll make it easier to forward-port any future upstream changes.
>
> I don't foresee many, if any, changes to this code.  It either
> implements the specification or it doesn't, and it's objectively easy to
> determine which.  There's not even an argument to port performance
> improvements, since almost everyone will be using a crypto library to
> provide this code because libraries perform so dramatically better.
> I've tried to not make the code perform worse than it did originally,
> but that's it.
>
> Furthermore, the modified code carries a relatively small amount of
> resemblance to the original, so if we did port changes forward, we'd
> probably have conflicts.
>
> It seems like you really want to include the upstream code as a separate
> commit and I understand where you're coming from with wanting to have
> this split out into logical commits, but due to the specific nature of
> this code, I see a lot of downsides and not a lot of upsides.

Yeah sorry to keep bringing this up. Your way makes sense, and I'd
forgotten the details since last time . I'll shut up about it:)

>> > +	perl -E "for (1..100000) { print q{aaaaaaaaaa}; }" | \
>> > +		test-tool sha256 >actual &&
>> > +	grep cdc76e5c9914fb9281a1c7e284d73e67f1809a48a497200e046d39ccc7112cd0 actual &&
>> > +	perl -E "for (1..100000) { print q{abcdefghijklmnopqrstuvwxyz}; }" | \
>> > +		test-tool sha256 >actual &&
>>
>> I've been wanting to make use depend on perl >= 5.10 (previous noises
>> about that on-list), but for now we claim to support >=5.8, which
>> doesn't have the -E switch.
>
> Good point.  I'll fix that.  After having written a lot of one-liners,
> I always write -E, and this was originally a one-liner.
>
>> But most importantly you aren't even using -E features here, and this
>> isn't very idoimatic Perl. Instead do, respectively:
>>
>>     perl -e 'print q{aaaaaaaaaa} x 100000'
>>     perl -e "print q{abcdefghijklmnopqrstuvwxyz} x 100000"
>
> I considered the more idiomatic version originally, but the latter could
> allocate a decent amount of memory in one chunk, and I wanted to avoid
> that.

~2.5MB for the latter, so trivial.

> I think what I'd like to do, actually, is turn on autoflush and
> use a postfix for, which would be more idiomatic and could potentially
> provide better testing of the chunking code.  I'll add a comment to that
> effect.

Okey. Maybe better to use syswrite() instead, or maybe print with
autoflush is more idiomatic, I don't know.