Re: [PATCH v4 2/2] sha1_file: refactor sha1_file.c to support 'cat-file --literally'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 03/18/2015 01:40 AM, Junio C Hamano wrote:
Karthik Nayak <karthik.188@xxxxxxxxx> writes:

Subject: Re: [PATCH v4 2/2] sha1_file: refactor sha1_file.c to support 'cat-file --literally'

Modify sha1_loose_object_info() to support 'cat-file --literally'
by accepting flags and also make changes to copy the type to
object_info::typename.

It would be more descriptive to mention the central point on the
title regarding what it takes to "support cat-file --literally".

For example:

   sha1_file.c: support reading from a loose object of a bogus type

   Update sha1_loose_object_info() to optionally allow it to read
   from a loose object file of unknown/bogus type; as the function
   usually returns the type of the object it read in the form of enum
   for known types, add an optional "typename" field to receive the
   name of the type in textual form.

By the way, I think your 1/2 would not even compile unless you have
this change; the patches in these two patch series must be swapped,
no?

Noted. Yes it wouldn't. I thought both would be applied together and didn't give it much thought, but yeah, I should pay more attention to that.
diff --git a/sha1_file.c b/sha1_file.c
index 69a60ec..e31e9e2 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1564,6 +1564,36 @@ int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long ma
  	return git_inflate(stream, 0);
  }

+static int unpack_sha1_header_literally(git_zstream *stream, unsigned char *map,
+					unsigned long mapsize,
+					struct strbuf *header)
+{
+	unsigned char buffer[32], *cp;
+	unsigned long bufsiz = sizeof(buffer);
+	int status;
+
+	/* Get the data stream */
+	memset(stream, 0, sizeof(*stream));
+	stream->next_in = map;
+	stream->avail_in = mapsize;
+	stream->next_out = buffer;
+	stream->avail_out = bufsiz;
+
+	git_inflate_init(stream);
+
+	do {
+		status = git_inflate(stream, 0);
+		strbuf_add(header, buffer, stream->next_out - buffer);
+		for (cp = buffer; cp < stream->next_out; cp++)
+			if (!*cp)
+				/* Found the NUL at the end of the header */
+				return 0;
+		stream->next_out = buffer;
+		stream->avail_out = bufsiz;
+	} while (status == Z_OK);
+	return -1;
+}
+

There is nothing "literal" about this function.

The only difference between the original and this one is that this
uses a strbuf, instead of a buffer of a fixed size allocated by the
caller, to return the early part of the inflated data.

I wonder if it incurs too much overhead to the existing callers if
you reimplementated unpack_sha1_header() as a thin wrapper around
this function, something like

	int unpack_sha1_header(git_zstream *stream,
				unsigned char *map, unsigned long mapsize,
                                 void *buffer, unsigned long bufsiz)
	{
                 int status;
		struct strbuf header = STRBUF_INIT;

                 status = unpack_sha1_header_to_strbuf(stream, map, mapsize, &header);
		if (bufsiz < header.len)
                 	die("object header too long");
		memcpy(buffer, header.buf, header.len);
                 return status;
	}
     			
Note that I am *not* suggesting to do this blindly.  If there is no
downside from performance point of view, however, the above would be
a way to avoid code duplication.

Another way to avoid code duplication may be to implement
unpack_sha1_header_to_strbuf() in terms of unpack_sha1_header(),
perhaps like this:

	static int unpack_sha1_header_to_strbuf(...)
         {
		unsigned char buffer[32], *cp;
                 unsigned long bufsiz = sizeof(buffer);
         	int status = unpack_sha1_header(stream, map, mapsize, buffer, bufsiz);

		if (status)
                 	return status;
		do {
			strbuf_add(header, buffer, stream->next_out - buffer);
			for (cp = buffer; cp < stream->next_out; cp++)
				if (!*cp)
					/* Found the NUL at the end of the header */
					return 0;
			stream->next_out = buffer;
			stream->avail_out = bufsiz;
		} while (status == Z_OK);
		return -1;
	}

which may be more in line with how we read data from loose objects.

Agreed there is code duplication with unpack_sha1_header() and unpack_sha1_header_extended(). But I thought there would be a performance trade off, Any way I could test this? Also the second suggestion you have given seems to be more practical, As there is no performance overhead, if called by existing code.

+int parse_sha1_header_extended(const char *hdr, struct object_info *oi,
+			       int flags)

Unless we are taking advantage of the fact that MSB is special in
signed integral types, I would prefer to see us use an unsigned type
to store these bits in a variable of an integral type.  That would
let the readers assume that they have fewer tricky things possibly
going on in the code (also see the footnote to $gmane/263751).

Thanks for the link. Will change to unsigned.

@@ -1652,12 +1674,45 @@ int parse_sha1_header(const char *hdr, unsigned long *sizep)
  			size = size * 10 + c;
  		}
  	}
-	*sizep = size;
+
+	type = type_from_string_gently(typename.buf, -1, 1);

Doesn't the code know how long the typename is at this point?
Yes, will change.

+	if (oi->sizep)
+		*oi->sizep = size;
+	if (oi->typename)
+		strbuf_addstr(oi->typename, typename.buf);

Likewise.  Perhaps strbuf_addbuf()?
That also calls strbuf_add(). But does improve readability

+	if (oi->typep)
+		*oi->typep = type;

Do you want to store -1 to *oi->typep when the caller asked to do
the "literally" thing?  Shouldn't it match what is returned from
this function?
Yes it should. Will make changes.

+	return *hdr ? -1 : type;
+}
+
+/*
+ * We used to just use "sscanf()", but that's actually way
+ * too permissive for what we want to check. So do an anal
+ * object header parse by hand. Calls the extended function.
+ */

The comment "let's do better than sscanf() by parsing ourselves"
applies to the implementation of _extended() function, not this one,
no?  It is clear to everybody that it "Calls the extended function",
so why mention it?

+int parse_sha1_header(const char *hdr, unsigned long *sizep)
+{
+	struct object_info oi;
+
+	oi.sizep = sizep;
+	oi.typename = NULL;
+	oi.typep = NULL;
+	return parse_sha1_header_extended(hdr, &oi, LOOKUP_REPLACE_OBJECT);
  }

@@ -2524,13 +2579,15 @@ struct packed_git *find_sha1_pack(const unsigned char *sha1,
  }

  static int sha1_loose_object_info(const unsigned char *sha1,
-				  struct object_info *oi)
+				  struct object_info *oi,
+				  int flags)
  {
-	int status;
+	int status = 0;
  	unsigned long mapsize, size;
  	void *map;
  	git_zstream stream;
  	char hdr[32];
+	struct strbuf hdrbuf = STRBUF_INIT;

  	if (oi->delta_base_sha1)
  		hashclr(oi->delta_base_sha1);
@@ -2557,17 +2614,29 @@ static int sha1_loose_object_info(const unsigned char *sha1,
  		return -1;
  	if (oi->disk_sizep)
  		*oi->disk_sizep = mapsize;
-	if (unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
-		status = error("unable to unpack %s header",
-			       sha1_to_hex(sha1));
-	else if ((status = parse_sha1_header(hdr, &size)) < 0)
-		status = error("unable to parse %s header", sha1_to_hex(sha1));
-	else if (oi->sizep)
+	if ((flags & LOOKUP_LITERALLY)) {
+		if (unpack_sha1_header_literally(&stream, map, mapsize, &hdrbuf) < 0)
+			status = error("unable to unpack %s header with --literally",
+				       sha1_to_hex(sha1));
+		else if ((status = parse_sha1_header_extended(hdrbuf.buf, oi, flags)) < 0)
+			status = error("unable to parse %s header", sha1_to_hex(sha1));
+	} else {
+		if (unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
+			status = error("unable to unpack %s header",
+				       sha1_to_hex(sha1));
+		else if ((status = parse_sha1_header(hdr, &size)) < 0)
+			status = error("unable to parse %s header", sha1_to_hex(sha1));
+	}
+	if (oi->sizep)
  		*oi->sizep = size;

Have you considered calling parse_sha1_header_extended() for both
literally and normal cases?  Then you wouldn't have to do any of the
"if (oi->blah) *oi->blah = value", no?

  	git_inflate_end(&stream);
  	munmap(map, mapsize);
  	if (oi->typep)
  		*oi->typep = status;

Likewise.

+	if (oi->typename && !(oi->typename->len))
+		strbuf_addstr(oi->typename, typename(status));

Likewise.


No. I have not considered that. will look into this.
Thanks for the suggestions.

Regards
-Karthik
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]