Re: [PATCH] Add a birdview-on-the-source-code section to the user manual

Daniel Barkalow <barkalow@xxxxxxxxxxxx> · Wed, 9 May 2007 00:54:03 -0400 (EDT)

On Wed, 9 May 2007, Johannes Schindelin wrote:

> Hi,
> 
> On Tue, 8 May 2007, Karl Hasselström wrote:
> 
> > On 2007-05-08 23:07:04 +0200, Johannes Schindelin wrote:
> > 
> > > On Tue, 8 May 2007, Karl Hasselström wrote:
> > >
> > > > On 2007-05-08 17:10:47 +0200, Johannes Schindelin wrote:
> > > >
> > > > > +  char *`, but is actually expected to be a pointer to `unsigned
> > > > > +  char[20]`.  This variable will contain the big endian version of the
> > > > > +  40-character hex string representation of the SHA-1.
> > > >
> > > > Either it should be "unsigned char[40]" (or possibly 41 with a
> > > > terminating \0), or else you shouldn't be talking about
> > > > hexadecimal since it's just a 20-byte big-endian unsigned integer.
> > > > (A third possibility is that I'm totally confused.)
> > >
> > > It is 40 hex-character, but 20 _byte_. If you have any ideas how to
> > > formulate that better than I did...
> > 
> > I think this is less confusing:
> > 
> >   This variable will contain the 160-bit SHA-1.
> > 
> > It avoids talking of hex, since it's not really stored in hex format
> > any more than any other binary number with a number of bits divisible
> > by four. And it avoids saying big-endian, which is not relevant anyway
> > since we don't use hashes as integers.
> 
> Well, I do not buy into that. First, we _have_ to say that it is 
> big-endian. It was utterly confusing to _me_ that the hash was not little 
> endian, as I expected on an Intel processor.

SHA-1 is defined as producing a octet sequence, and to have a canonical 
hex digit sequence conversion with the high nibbles first. Internally, it 
is canonically specified using big-endian math, but the same algorithm 
could equally be specified with little-endian math and different rules for 
input and output.

> And I'd rather mention the hex representation (what you see in git-log and 
> git-ls-tree). This helps debugging, believe me.

It's kind of important to distinguish between the hex representation and 
the octet representation, because your code will not work at all if you 
use the wrong one. And "unsigned char *" or "unsigned char[20]" is always 
the octets; the hex is always "char *". Primarily mentioning the one that 
is more intuitive but less frequently used doesn't help with understanding 
the actual code.

	-Daniel
*This .sig left intentionally blank*