Re: [PATCH 8/8] Add SVN dump parser

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ramkumar Ramachandra wrote:

> svndump parses data that is in SVN dumpfile format produced by
> `svnadmin dump` with the help of line_buffer and uses repo_tree and
> fast_export to emit a git fast-import stream.

Probably worth mentioning the this requires a dumpfile v2 (i.e., it
does not understand the svndiff0 delta format yet).

> +/* Create memory pool for log messages */
> +obj_pool_gen(log, char, 4096);
> +
> +static char* log_copy(uint32_t length, char *log)
> +{
> +	char *buffer;
> +	log_free(log_pool.size);
> +	buffer = log_pointer(log_alloc(length));
> +	strncpy(buffer, log, length);
> +	return buffer;
> +}

A strbuf would do just as well.  But using obj_pool looks like a fine
way to avoid depending on another piece of git code.

> +static struct {
> +	uint32_t svn_log, svn_author, svn_date, svn_executable, svn_special, uuid,
> +		revision_number, node_path, node_kind, node_action,
> +		node_copyfrom_path, node_copyfrom_rev, text_content_length,
> +		prop_content_length, content_length;
> +} keys;

Neat.  This is a textbook example of where to use a perfect hash, but
comparing interned strings is simpler and fast enough (and the
bottlenecks are elsewhere).

> +static void read_props(void)
[...]
> +			if (key == keys.svn_log) {
> +				/* Value length excludes terminating nul. */
> +				rev_ctx.log = log_copy(len + 1, val);
> +			} else if (key == keys.svn_author) {
> +				rev_ctx.author = pool_intern(val);
> +			} else if (key == keys.svn_date) {
> +				if (parse_date_basic(val, &rev_ctx.timestamp, NULL))
> +					fprintf(stderr, "Invalid timestamp: %s\n", val);
> +			} else if (key == keys.svn_executable) {
> +				node_ctx.type = REPO_MODE_EXE;
> +			} else if (key == keys.svn_special) {
> +				node_ctx.type = REPO_MODE_LNK;
> +			}
> +			key = ~0;

Unknown properties are ignored.  Adding stream comments to allow
recovering them is left as an exercise for the interested reader.

> +static void handle_node(void)
> +{

A simple reader does not cope well with this kind of function.  It is
hard to know if it exhaustively deals with all cases.

But: with real-world repos (e.g. ASF) it works well enough.

> +void svndump_read(char *url)
> +{

Too long.  I realize that writing a state machine can be hard in C;
maybe it would be easiest to package up the state in a struct and
have a separate function for the main loop body.

The patches I didn’t comment on all look good.  I don’t think anything I did
comment on should prevent this reaching a wider audience.

Thanks for the pleasant read,
Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]