Re: [PATCH 2/3] t: add t0016-oidmap.sh

SZEDER Gábor <szeder.dev@xxxxxxxxx> · Sun, 9 Jun 2019 23:21:30 +0200

On Sun, Jun 09, 2019 at 10:24:55PM +0200, Christian Couder wrote:
> On Sun, Jun 9, 2019 at 11:23 AM SZEDER Gábor <szeder.dev@xxxxxxxxx> wrote:
> >
> > On Sun, Jun 09, 2019 at 06:49:06AM +0200, Christian Couder wrote:
> > > +
> > > +test_oidmap() {
> > > +     echo "$1" | test-tool oidmap $3 > actual &&
> > > +     echo "$2" > expect &&
> >
> > Style nit: space between redirection op and filename.
> 
> Thanks for spotting this. It's fixed in my current version.
> 
> > > +test_oidhash() {
> > > +     git rev-parse "$1" | perl -ne 'print hex("$4$3$2$1") . "\n" if m/^(..)(..)(..)(..).*/;'
> >
> > New Perl dependencies always make Dscho sad... :)
> 
> Yeah, I was not sure how to do it properly in shell so I was hoping I
> would get suggestions about this. Thanks for looking at this!
> 
> I could have hardcoded the values as it is done in t0011-hashmap.sh,
> but I thought it was better to find a function that does he job.

Well, I'm fine with hardcoding the expected hash values (in network
byte order) as well, because then we won't add another git process
upstream of a pipe that would pop up during audit later...

> > So, 'test oidmap' from the previous patch prints the value we want to
> > check with:
> >
> >     printf("%u\n", sha1hash(oid.hash));
> 
> Yeah, I did it this way because "test-hashmap.c" does the same kind of
> thing to print hashes:
> 
>             printf("%u %u %u %u\n",
>                    strhash(p1), memhash(p1, strlen(p1)),
>                    strihash(p1), memihash(p1, strlen(p1)));
> 
> > First, since object ids inherently make more sense as hex values, it
> > would be more appropriate to print that hash with the '%x' format
> > specifier,
> 
> I would be ok with that, but then I think it would make sense to also
> print hex values in "test-hashmap.c".
> 
> > and then we wouldn't need Perl's hex() anymore, and thus
> > could swap the order of the first four bytes in oidmap's hash without
> > relying on Perl, e.g. with:
> >
> >   sed -e 's/^\(..\)\(..\)\(..\)\(..\).*/\4\3\2\1/'
> >
> > Second, and more importantly, the need for swapping the byte order
> > indicates that this test would fail on big-endian systems, I'm afraid.
> > So I think we need an additional bswap32() on the printing side,
> 
> Ok, but then shouldn't we also use bswap32() in "test-hashmap.c"?

No.  The two test scripts/helpers work with different hashes.  t0011
and 'test-hashmap.c' uses the various FNV-1-based hash functions
(strhash(), memhash(), ...) to calculate an unsigned int hash of the
items stored in the hashmap, therefore their hashes will be the same
regardless of endianness.  In an oidmap, however, the hash is simply
the first four bytes of the object id as an unsigned int as is, and
look at how sha1hash() does it, and indeed at the last sentence of the
comment in front of it:

 * [...] Note that
 * the results will be different on big-endian and little-endian
 * platforms, so they should not be stored or transferred over the net.
 */
static inline unsigned int sha1hash(const unsigned char *sha1)
{
        /*
         * Equivalent to 'return *(unsigned int *)sha1;', but safe on
         * platforms that don't support unaligned reads.
         */
        unsigned int hash;
        memcpy(&hash, sha1, sizeof(hash));
        return hash;
}

> By the way it seems that we use ntohl() or htonl() instead of
> bswap32() in the source code.

OK.

> > and then could further simplify 'test_oidhash':
> >
> > diff --git a/t/helper/test-oidmap.c b/t/helper/test-oidmap.c
> > index 0ba122a264..4177912f9a 100644
> > --- a/t/helper/test-oidmap.c
> > +++ b/t/helper/test-oidmap.c
> > @@ -51,7 +51,7 @@ int cmd__oidmap(int argc, const char **argv)
> >
> >                         /* print hash of oid */
> >                         if (!get_oid(p1, &oid))
> > -                               printf("%u\n", sha1hash(oid.hash));
> > +                               printf("%x\n", bswap32(sha1hash(oid.hash)));
> >                         else
> >                                 printf("Unknown oid: %s\n", p1);
> >
> > diff --git a/t/t0016-oidmap.sh b/t/t0016-oidmap.sh
> > index 3a8e8bdb3d..9c0d88a316 100755
> > --- a/t/t0016-oidmap.sh
> > +++ b/t/t0016-oidmap.sh
> > @@ -22,10 +22,10 @@ test_expect_success 'setup' '
> >  '
> >
> >  test_oidhash() {
> > -       git rev-parse "$1" | perl -ne 'print hex("$4$3$2$1") . "\n" if m/^(..)(..)(..)(..).*/;'
> > +       git rev-parse "$1" | cut -c1-8
> >  }
> >
> > -test_expect_success PERL 'hash' '
> > +test_expect_success 'hash' '
> 
> Yeah, I agree that it seems better to me this way.