Re: [PATCHv2 2/2] fast-import: tighten parsing of mark references

Pete Wyckoff <pw@xxxxxxxx> · Tue, 3 Apr 2012 21:20:37 -0400

jrnieder@xxxxxxxxx wrote on Tue, 03 Apr 2012 09:20 -0500:
> (cc-ing Johan for noteimport code)

Thanks, glad you noticed.

> Pete Wyckoff wrote:
> [...]
> > +++ b/fast-import.c
> [...]
> > @@ -2236,20 +2287,24 @@ static void file_change_m(struct branch *b)
> >  
> >  	if (*p == ':') {
> >  		char *x;
> > -		oe = find_mark(strtoumax(p + 1, &x, 10));
> > +		oe = find_mark(parse_mark_ref_space(p, &x));
> >  		hashcpy(sha1, oe->idx.sha1);
> >  		p = x;
> 
> Simpler:
> 
> 	if (*p == ':') {
> 		oe = find_mark(parse_mark_ref_space(p, &p));
> 		hashcpy(sha1, oe->idx.sha1);
> 	} else if ...

Yes.  I thought about just passing in plain old &p.  Even though
these approaches would work, it is a bit more difficult for
novice C coders to read.  Figured we should err on the side of
helping future code readers.  I can add more cleverness if you
feel strongly.

> >  	} else if (!prefixcmp(p, "inline")) {
> >  		inline_data = 1;
> >  		p += 6;
> > +		if (*p != ' ')
> > +			die("Missing space after 'inline': %s",
> > +			    command_buf.buf);
> >  	} else {
> >  		if (get_sha1_hex(p, sha1))
> >  			die("Invalid SHA1: %s", command_buf.buf);
> 
> If I write
> 
> 	M 100644 inliness some/path/to/file
> 
> was my mistake actually leaving out a space after 'inline' or
> was it using an invalid <dataref>?
> 
> I think the latter, so I would suggest
> 
> 	} else if (!prefixcmp(p, "inline ")) {
> 		inline_data = 1;
> 		p += strlen("inline");	/* advance to space */
> 	} else {
> 		if (get_sha1_hex(p, sha1))
> 			...

Insead of "Missing space after 'inline'", you'll get "Invalid
SHA1".  You misspelled "inline" with "inliness"?  And would
prefer to be told you provided an invalid SHA1?

I'm tempted to guess that any string starting with "inline", e.g.
"inlinePath/To/File" without a space is still a good indication
that they were trying to say "inline ".  The chance that they
horribly typed a SHA1, or really had a path staring with "inline"
and forgot the dataref entirely, feel less likely.

> [...]
> >  	}
> > -	if (*p++ != ' ')
> > -		die("Missing space after SHA1: %s", command_buf.buf);
> > +	++p;  /* skip space */
> 
> I guess I'd suggest
> 
> 	assert(*p == ' ');
> 	p++;
> 
> as defense against coders introducing additional cases that
> are not as careful.

Good suggestion, thanks.

> > @@ -2408,20 +2463,24 @@ static void note_change_n(struct branch *b, unsigned char *old_fanout)
> >  	/* <dataref> or 'inline' */
> >  	if (*p == ':') {
> >  		char *x;
> > -		oe = find_mark(strtoumax(p + 1, &x, 10));
> > +		oe = find_mark(parse_mark_ref_space(p, &x));
> >  		hashcpy(sha1, oe->idx.sha1);
> >  		p = x;
> 
> Likewise (btw, why doesn't this share code with the filemodify case?):
> 
> 	if (*p == ':') {
> 		oe = find_mark(parse_mark_with_trailing_space(p, &p));
> 		hashcpy(sha1, oe->idx.sha1);
> 	} else if ...
> 
> and so on.

It does feel like a good opportunity for some refactoring.  Two
out of the three callers to parse an mark followed by a space
could be put together here.

[..]
> > @@ -2955,9 +3007,12 @@ static struct object_entry *parse_treeish_dataref(const char **p)
> >  	} else {	/* <sha1> */
> >  		if (get_sha1_hex(*p, sha1))
> >  			die("Invalid SHA1: %s", command_buf.buf);
> > -		e = find_object(sha1);
> >  		*p += 40;
> > +		if (**p != ' ')
> > +			die("Missing space after SHA1: %s", command_buf.buf);
> > +		e = find_object(sha1);
> 
> This seems dangerous.  What if a new caller arises that wants to
> parse a <dataref> representing a tree-ish at the end of the line?
> 
> So I think checking the character after the tree-ish should still
> be the caller's responsibility.

I was close to just moving parse_treeish_dataref() into its
single caller, parse_ls(), just so we wouldn't have to think
about this.

There are two cases it handles:  mark and sha1.  The mark case
uses the handy new parse_mark_ref_space(), which does the space
checking.  The sha1 branch had no check in this function.  So
I hoisted the space check up to make the branches symmetrical.

> >  	}
> > +	*p += 1;  /* skip space */
> 
> If other patches in flight use the same function, they would expect
> *p to point to the space when parse_treeish_dataref returns.  If we
> wanted to change that (as mentioned above I don't think we ought to)
> then the function's name should be changed to force such new callers
> not to compile.
> 
> > @@ -3008,8 +3063,6 @@ static void parse_ls(struct branch *b)
> >  		root = new_tree_entry();
> >  		hashcpy(root->versions[1].sha1, e->idx.sha1);
> >  		load_tree(root);
> > -		if (*p++ != ' ')
> > -			die("Missing space after tree-ish: %s", command_buf.buf);
> 
> (here's the caller).

I would prefer just to inline the whole thing.  Or new name
parse_ls_dataref() if you have a preference.

> Except where noted above, this looks good.

Thanks.

		-- Pete
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html