Re: win2k/cygwin cannot handle even moderately sized packs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Shawn Pearce, Tue, Nov 07, 2006 19:56:48 +0100:
> I've pushed the changes out to repo.or.cz:
> 
> 	http://repo.or.cz/w/git/fastimport.git
> 
> in the window-mapping branch.  Note that this is based on a slightly
> older version of Git (v1.4.2).  There are two "tuneables" on line 376
> of sha1_file.c, this is the maximum amount of memory (in bytes) to
> denote to packs and the maximum chunk size of each pack (in bytes).

Thanks.
I couldn't help noticing that the interface to the packs data is
a bit complex:

    unsigned char *use_pack(struct packed_git *p,
			    struct pack_window **window,
			    unsigned long offset,
			    unsigned int *left);
    void unuse_pack(struct pack_window **w);

Can I suggest something like below?

    unsigned char *use_pack(struct packed_git *p,
			    off_t offset, size_t size, size_t *mapped);
    void unuse_pack(struct packed_git *p, off_t offset, size_t size);
    or
    void unuse_pack(struct packed_git *p, unsigned char *data);

    (where size/maxsize is the amount of bytes at the offset in the
    pack file the caller was asking for. use_pack would fail if offset
    or size do not pass sanity checks (offset past end of file, size
    is 0).  The mapped argument would get the length of the data
    actually mapped, and can be less than requested, subject to end of
    data). The window sliding code seem to have all information it
    needs with this set of arguments.

Or am I missing something very obvious, and something like this
is just not feasible for some reasons?

I'm asking because I tried to slowly rebase the window-mapping up and
merge the newer branches into it (to get it working with more recent
code). At some point I came over conflicts and one of them got me
thinking about the interface. That's the part:

<<<<<<< HEAD/sha1_file.c
	c = *use_pack(p, w, offset++, NULL);
	*type = (c >> 4) & 7;
	size = c & 15;
	shift = 4;
	while (c & 0x80) {
		c = *use_pack(p, w, offset++, NULL);
		size += (c & 0x7f) << shift;
		shift += 7;
	}
	*sizep = size;
	return offset;
=======
	used = unpack_object_header_gently((unsigned char *)p->pack_base +
					   offset,
					   p->pack_size - offset, type, sizep);
	if (!used)
		die("object offset outside of pack file");

	return offset + used;
>>>>>>> f685d07de045423a69045e42e72d2efc22a541ca/sha1_file.c

I was almost about to move your code into unpack_object_header_gently,
but ... The header isn't that big, is it? It is variable in the pack,
but the implementation of the parser is at the moment restricted by
the type we use for object size (unsigned long for the particular
platform). For example:

	/* Object size is in the first 4 bits, and in the low 7 bits
	 * of the subsequent bytes which have high bit set */
	#define MAX_LOCAL_HDRSIZE ((sizeof(long) * 8 - 4) / 7 + 1)
	unsigned long size;
	unsigned char *header = use_pack(p, offset, MAX_LOCAL_HDRSIZE, &size);
	if (!header)
	    die("object header offset out of range");
	/* unpack_object_header_gently takes care about truncated
	 * headers, by returning 0 if it encounters one */
	used = unpack_object_header_gently(header, size, type, sizep);

Wouldn't have to change unpack_object_header_gently at all.

(BTW, current unpack_object_header_gently does not use it's len
argument to check if there actually is enough data to hold at least
minimal header. Is the size of mapped data checked for correctness
somewhere before?)

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]