Ramkumar Ramachandra wrote: > svndump parses data that is in SVN dumpfile format produced by > `svnadmin dump` with the help of line_buffer and uses repo_tree and > fast_export to emit a git fast-import stream. Probably worth mentioning the this requires a dumpfile v2 (i.e., it does not understand the svndiff0 delta format yet). > +/* Create memory pool for log messages */ > +obj_pool_gen(log, char, 4096); > + > +static char* log_copy(uint32_t length, char *log) > +{ > + char *buffer; > + log_free(log_pool.size); > + buffer = log_pointer(log_alloc(length)); > + strncpy(buffer, log, length); > + return buffer; > +} A strbuf would do just as well. But using obj_pool looks like a fine way to avoid depending on another piece of git code. > +static struct { > + uint32_t svn_log, svn_author, svn_date, svn_executable, svn_special, uuid, > + revision_number, node_path, node_kind, node_action, > + node_copyfrom_path, node_copyfrom_rev, text_content_length, > + prop_content_length, content_length; > +} keys; Neat. This is a textbook example of where to use a perfect hash, but comparing interned strings is simpler and fast enough (and the bottlenecks are elsewhere). > +static void read_props(void) [...] > + if (key == keys.svn_log) { > + /* Value length excludes terminating nul. */ > + rev_ctx.log = log_copy(len + 1, val); > + } else if (key == keys.svn_author) { > + rev_ctx.author = pool_intern(val); > + } else if (key == keys.svn_date) { > + if (parse_date_basic(val, &rev_ctx.timestamp, NULL)) > + fprintf(stderr, "Invalid timestamp: %s\n", val); > + } else if (key == keys.svn_executable) { > + node_ctx.type = REPO_MODE_EXE; > + } else if (key == keys.svn_special) { > + node_ctx.type = REPO_MODE_LNK; > + } > + key = ~0; Unknown properties are ignored. Adding stream comments to allow recovering them is left as an exercise for the interested reader. > +static void handle_node(void) > +{ A simple reader does not cope well with this kind of function. It is hard to know if it exhaustively deals with all cases. But: with real-world repos (e.g. ASF) it works well enough. > +void svndump_read(char *url) > +{ Too long. I realize that writing a state machine can be hard in C; maybe it would be easiest to package up the state in a struct and have a separate function for the main loop body. The patches I didn’t comment on all look good. I don’t think anything I did comment on should prevent this reaching a wider audience. Thanks for the pleasant read, Jonathan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html