Re: [PATCH] Add a birdview-on-the-source-code section to the user manual

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Wed, 9 May 2007 11:38:34 +0200 (CEST)

Hi,

On Wed, 9 May 2007, Daniel Barkalow wrote:

> On Wed, 9 May 2007, Johannes Schindelin wrote:
> 
> > On Tue, 8 May 2007, Karl Hasselstr�rote:
> > 
> > > On 2007-05-08 23:07:04 +0200, Johannes Schindelin wrote:
> > > 
> > > > On Tue, 8 May 2007, Karl Hasselstr�rote:
> > > >
> > > > > On 2007-05-08 17:10:47 +0200, Johannes Schindelin wrote:
> > > > >
> > > > > > +  char *`, but is actually expected to be a pointer to `unsigned
> > > > > > +  char[20]`.  This variable will contain the big endian version of the
> > > > > > +  40-character hex string representation of the SHA-1.
> > > > >
> > > > > Either it should be "unsigned char[40]" (or possibly 41 with a
> > > > > terminating \0), or else you shouldn't be talking about
> > > > > hexadecimal since it's just a 20-byte big-endian unsigned integer.
> > > > > (A third possibility is that I'm totally confused.)
> > > >
> > > > It is 40 hex-character, but 20 _byte_. If you have any ideas how to
> > > > formulate that better than I did...
> > > 
> > > I think this is less confusing:
> > > 
> > >   This variable will contain the 160-bit SHA-1.
> > > 
> > > It avoids talking of hex, since it's not really stored in hex format
> > > any more than any other binary number with a number of bits divisible
> > > by four. And it avoids saying big-endian, which is not relevant anyway
> > > since we don't use hashes as integers.
> > 
> > Well, I do not buy into that. First, we _have_ to say that it is 
> > big-endian. It was utterly confusing to _me_ that the hash was not little 
> > endian, as I expected on an Intel processor.
> 
> SHA-1 is defined as producing a octet sequence, and to have a canonical 
> hex digit sequence conversion with the high nibbles first. Internally, it 
> is canonically specified using big-endian math, but the same algorithm 
> could equally be specified with little-endian math and different rules for 
> input and output.
> 
> > And I'd rather mention the hex representation (what you see in git-log and 
> > git-ls-tree). This helps debugging, believe me.
> 
> It's kind of important to distinguish between the hex representation and 
> the octet representation, because your code will not work at all if you 
> use the wrong one. And "unsigned char *" or "unsigned char[20]" is always 
> the octets; the hex is always "char *". Primarily mentioning the one that 
> is more intuitive but less frequently used doesn't help with understanding 
> the actual code.

That's a really good idea, to point out that "unsigned char *" refers to 
octets, while "char *" refers to the ASCII representation. I will add 
this, together with a simple example (the initial commit).

Ciao,
Dscho