Re: Pack files, standards compliance, and efficiency

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 05, 2015 at 01:41:21AM +0000, brian m. carlson wrote:

> However, with the object_id conversion, we run into a problem: casting
> those unsigned char * values into struct object_id * values is not
> allowed by the C standard.  There are two possible solutions: copying;
> and the just-do-it solution, where we cast and hope for the best.

I'm not sure if it does violate the standard. The address of the first
element of a struct is guaranteed to match the address of the struct
itself. The object_id.hash member is an array of unsigned char, so there
are no alignment issues. It might run afoul of rules about casting
between pointer types (i.e., pointers for all types are not guaranteed
to be the same size). The standard dictates that "char *" and "void *"
are the same size, and big enough to hold a pointer to anything, so I
think it might even be OK.

But I'm not even sure that line of thinking is all that interesting.
Even if we are violating some dark corner of the standard, this
definitely falls into the "it's useful and works on all sane machines"
category. We also do much worse things with struct-casting mmap'd data
elsewhere (e.g., see the use of "struct pack_header"). It works fine in
practice as long as you are careful about alignment and padding issues.

So my vote would be to retain the cast. This is very low-level,
performance-sensitive code. I did some very naive timings and didn't see
any measurable change from your patch, but I also don't think we are
seeing a real portability benefit to moving to the copy, so I'd prefer
to keep the status quo.

> It looks like we use the latter in nth_packed_object_offset, where we
> cast an unsigned char * value to uint32_t *, which is clearly not
> allowed.

Yes, this one is definitely dubious by the standard. However, it works
in practice because the index format is designed to be 4-byte aligned.
By contrast, the .bitmap format is not, and we have to use get_be32, etc
(which is really not the end of the world, but I do not think there is
any real reason to change the .idx code at this point).

> I'm to the point where I need to make a decision on how to
> proceed, and I've included a patch with the copying conversion of
> nth_packed_object_sha1 below for comparison.  The casting solution is
> obviously more straightforward.  I haven't tested either implementation
> for performance.

The test I did was just running "git rev-list --use-bitmap-index --count
HEAD" on a bitmapped linux.git repo. That's where I'd expect it to show
the most, because we are not doing much other work. But I think even
still, the timing is dominated by loading the bitmap file.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]