Hi Stefan, Stefan Sperling writes: > Review below. First, thanks for the detailed review! I'll be travelling over the next few days starting tomorrow, and didn't want to delay the response to your review: I've marked some items "TODO" so that I can grep for them when I'm back in India and fix them. > This diff is needed to build svnrdump as part of svn on Unix: > > Index: build.conf > =================================================================== > --- build.conf (revision 963733) > +++ build.conf (working copy) > @@ -167,6 +167,13 @@ libs = libsvn_wc libsvn_subr apriconv apr > install = bin > manpages = subversion/svnversion/svnversion.1 > > +[svnrdump] > +description = Subversion remote repository dumper > +type = exe > +path = subversion/svnrdump > +libs = libsvn_client libsvn_ra libvsvn_delta libsvn_subr apr > +install = bin > + > # Support for GNOME Keyring > [libsvn_auth_gnome_keyring] > description = Subversion GNOME Keyring Library Thanks! Now included. > Can you include the above bit in your diff, please, and create follow-up > diffs relative to the root of a Subversion trunk working copy? Thanks. Ah, I wasn't paying attention. `git diff` always produces diffs relative to the root. > Please also add a man page similar to the one of svnsync. > Even though I don't like the fact that our man pages simply refer to > the Subversion book rather than providing a small and useful subset of it, > it's good to at least be consistent about it. Fixed. > > +struct dump_edit_baton { > > Please add a comment here explaining what the stream is used for, > for instance /* The output stream we write the dumpfile to. */ > > > + svn_stream_t *stream; Fixed. > > + svn_revnum_t current_rev; > > This is only incremented by never used? Fixed. > > + > > + /* pool is for per-edit-session allocations */ > > + apr_pool_t *pool; > > + > > + /* Store the properties that changed */ > > + apr_hash_t *properties; > > + apr_hash_t *del_properties; /* Value is always 0x1 */ > > Just say "value is undefined". Or use an apr_array_header_t. Fixed. > A comment here saying what propstring is for would be nice. > > > + svn_stringbuf_t *propstring; Fixed. > > + > > + /* Was a copy command issued? */ > > + svn_boolean_t is_copy; > > Copy of what and when? This baton is global for the entire edit... > > Going through the code, I see that you're using this to indicate to > dump_node() whether an add_directory() or add_file() was in fact a copy. > Why not remove this field from the struct and add it as a parameter to > dump_node instead? TODO. > > + > > + /* Path of changed file */ > > + const char *changed_path; > > The changed_path field seems to be unused. > > According to comments in open_file() and add_file(), change_file_prop() > and apply_textdelta() should be using this but they aren't. TODO. > > + /* Temporary file to write delta to along with its checksum */ > > + char *temp_filepath; > > That's a poor variable name. What about delta_abspath? Fixed. > > + svn_checksum_t *checksum; > > And rename this to delta_checksum? Actually, I figured this wasn't used and removed it. > > + > > + /* Flags to trigger dumping props and text */ > > + svn_boolean_t must_dump_props; > > + svn_boolean_t must_dump_text; > > I'd call these dump_props and dump_text, but that's a matter of taste. Fixed. > > + svn_boolean_t dump_props_pending; > > +}; > > + > > +struct dir_baton { > > + struct dump_edit_baton *eb; > > + struct dir_baton *parent_dir_baton; > > + > > + /* is this directory a new addition to this revision? */ > > + svn_boolean_t added; > > + > > + /* has this directory been written to the output stream? */ > > + svn_boolean_t written_out; > > + > > + /* the absolute path to this directory */ > > + const char *path; > > In code written post-svn-1.6, we usually call absolute paths > something_abspath. E.g. local_abspath is an absolute path in a > filesystem on the client (but not in the repository). > > Elsewhere, this is called 'full_path' which would be fine name > for this field, too. (Though any full_path variable you see in svn > most likely pre-dates the *_abspath convention.) Renamed path to abspath. > > + > > + /* the comparison path and revision of this directory. if both of > > + these are valid, use them as a source against which to compare > > + the directory instead of the default comparison source of PATH in > > + the previous revision. */ > > + const char *cmp_path; > > + svn_revnum_t cmp_rev; > > These just seem to be used as regular copyfrom info, so let's name them > as such: copyfrom_path and copyfrom_rev > Then you can also shrink the comment above cause everyone knows what > copyfrom info is: /* Copyfrom info for the node, if any. */ Fixed. > > + > > + /* hash of paths that need to be deleted, though some -might- be > > + replaced. maps const char * paths to this dir_baton. (they're > > + full paths, because that's what the editor driver gives us. but > > + really, they're all within this directory.) */ > > + apr_hash_t *deleted_entries; > > This is very well commented and well named. > > > + > > + /* pool to be used for deleting the hash items */ > > + apr_pool_t *pool; > > Hmmm.. a pool does not delete anything. It provides storage. > Why do you need this? TODO. > > +}; > > + > > +struct handler_baton > > +{ > > + svn_txdelta_window_handler_t apply_handler; > > + void *apply_baton; > > + apr_pool_t *pool; > > Yet another pool. What's it for? See window_handler below :) > > + > > + /* Information about the path of the tempoarary file used */ > > s/tempoarary/temporary/ Fixed. > > + char *temp_filepath; > > + apr_file_t *temp_file; > > + svn_stream_t *temp_filestream; > > What's the temporary file used for? You're writing a delta to it, > so maybe name it accordingly? Fixed. > You need the file name to stat it in close_file(). > You need the stream in the baton to write to the file. > But you don't need the apr_file. > Open the file. Then wrap it in the stream with disown=FALSE, and pass > just the stream to the window handler via the baton. When the stream > is closed, the file will be closed as well. TODO. I want to validate and make sure that I don't break anything else or leak memory before performing this change. > > + > > + /* To fill in the edit baton fields */ > > + struct dump_edit_baton *eb; > > Just say /* Global edit baton. */ or even drop the comment. Fixed. > > +}; > > + > > Needs a docstring. Fixed. I noticed one more mistake: to_rev is unused. Looks like there's a LOT of historical crufts that I didn't clean up. I wonder why the compiler doesn't tell me about all this. > > +/* Make a directory baton to represent the directory was path > > + (relative to EDIT_BATON's path) is PATH. > > The above sentence doesn't parse. > > > + > > + CMP_PATH/CMP_REV are the path/revision against which this directory > > + should be compared for changes. If either is omitted (NULL for the > > + path, SVN_INVALID_REVNUM for the rev), just compare this directory > > + PATH against itself in the previous revision. > > s/CMP/COPYFROM/ and tweak the docstring to say something like: > If the copyfrom information is valid, the directory will be compared > against its copy source. Else, it will be compared against itself in > the previous revision. Another historical cruft: since svnrdump doesn't support non-deltified dumps and doesn't have fs backing, I can't and don't ever (need to) compare it against itself in the previous revision. > > + PARENT_DIR_BATON is the directory baton of this directory's parent, > > + or NULL if this is the top-level directory of the edit. ADDED > > + indicated if this directory is newly added in this revision. > > s/indicated/indicates/ Fixed. > > + apr_array_header_t *compose_path = apr_array_make(pool, 2, > > + sizeof(const char *)); > > The above line contains tabs, please replace with spaces. > And make sure to align function arguments like this (not sure if > they appeared aligned in your editor or not): Fixed. > > + if (pb) { > > I've told you this on IRC before, but just for sake of completeness: > Virtually all if blocks and loops in this patch have a "wrong" style > of indentation. > See > http://subversion.apache.org/docs/community-guide/conventions.html#coding-style > > (I personally prefer the indentation style you're using, > but the project convention has been set looooong ago -- such is life.) Ah, I missed this earlier- you have an svn-dev.el: I'll use it to fix everything before re-submitting. > > + APR_ARRAY_PUSH(compose_path, const char *) = "/"; > > + APR_ARRAY_PUSH(compose_path, const char *) = path; > > + full_path = svn_path_compose(compose_path, pool); > > See svn_dirent_join_many(). Fixed. > > + } > > + else > > + full_path = apr_pstrdup(pool, "/"); > > Why allocate "/" in a pool? This can be static string unless you > intend to write to it. Frankly, working with APR pools was quite a nightmare for me- after observing many cases of leaks and crashes, I jotted down some notes about using them and I made it a point to follow them strictly. This alloc adheres to those notes. I'll submit those notes to dev@ once I've polished them- new devs will probably find it useful. > > +/* > > + * Write out a node record for PATH of type KIND under EB->FS_ROOT. > > + * ACTION describes what is happening to the node (see enum svn_node_action). > > + * Write record to writable EB->STREAM, using EB->BUFFER to write in chunks. > > + * > > + * If the node was itself copied, IS_COPY is TRUE and the > > + * path/revision of the copy source are in CMP_PATH/CMP_REV. If > > + * IS_COPY is FALSE, yet CMP_PATH/CMP_REV are valid, this node is part > > + * of a copied subtree. > > Again, s/CMP/COPYFROM/ Fixed. > > + */ > > +static svn_error_t * > > +dump_node(struct dump_edit_baton *eb, > > + const char *path, /* an absolute path. */ > > + svn_node_kind_t kind, > > + enum svn_node_action action, > > + const char *cmp_path, > > + svn_revnum_t cmp_rev, > > + apr_pool_t *pool) > > +{ > > + /* Write out metadata headers for this file node. */ > > The node might as well be a directory, so the above comment is misleading. Fixed. > > + /* Remove leading slashes from copyfrom paths. */ > > + if (cmp_path) > > + cmp_path = ((*cmp_path == '/') ? cmp_path + 1 : cmp_path); > > What if the copyfrom path is "/"? > (If memory serves me right we've had a bug like this before somewhere...) Right, I copied this out from somewhere I think. Fixed with: if (copyfrom_path && strcmp(copyfrom_path, "/")) > > + > > + switch (action) { > > + /* Appropriately handle the four svn_node_action actions */ > > Nuke the above comment. Don't put numbers that may change some day > into comments. Okay. Fixed. > > + if (!eb->is_copy) { > > + /* eb->dump_props_pending for files is handled in > > + close_file which is called immediately. > > + However, directories are not closed until > > + all the work inside them have been done; > > s/have been/has been/ Fixed. > > + eb->dump_props_pending for directories is > > + handled in all the functions that can > > + possibly be called after add_directory: > > + add_directory, open_directory, > > + delete_entry, close_directory, add_file, > > + open_file and change_dir_prop; > > + change_dir_prop is a special case > > + ofcourse */ > > Please re-format the above using longer lines (up to column 78). Fixed. > > +static svn_error_t *open_root(void *edit_baton, > > + svn_revnum_t base_revision, > > + apr_pool_t *pool, > > + void **root_baton) > > tabs in above 3 lines Fixed. > > +{ > > + /* Allocate a special pool for the edit_baton to avoid pool > > + lifetime issues */ > > I think you don't need this comment because this is already > sort of documented in the docstring for open_root() in svn_delta.h. It took me a while to grasp this even after reading the documentation of open_root, and I think others will find it useful. Didn't remove comment. > > + if (val) > > + /* Delete the path, it's now been dumped */ > > + apr_hash_set(pb->deleted_entries, path, APR_HASH_KEY_STRING, NULL); > > You don't need to set the value to NULL in the hash table. > Doing so won't save any memory. I've say just remove the above 3 lines. Oh, I'm not doing it to save memory. Although I'm not sure if I still need it in my logic, this definitely makes debugging nicer. > tabs again in the above line Fixed. > > + APR_ARRAY_PUSH(compose_path, const char *) = > > + svn_relpath_basename(path, pool); > > and here is another tab Fixed. > > + cmp_path = svn_path_compose(compose_path, pool); > > Again, see svn_dirent_join_many(). > If you need a svn_relpath_join_many() for some reason please write one. Fixed. Yes, svn_relpath_join_many sounds like a good idea: I'll mark that as a TODO for now. > > +static svn_error_t * > > +close_directory(void *dir_baton, > > + apr_pool_t *pool) > > +{ > > + struct dir_baton *db = dir_baton; > > + struct dump_edit_baton *eb = db->eb; > > + apr_hash_index_t *hi; > > + apr_pool_t *subpool = svn_pool_create(pool); > > Please call this iterpool, not subpool. > You're using it in a loop (so we prefer "iteration pool"). Right. Fixed. > > + > > + /* Some pending properties to dump? */ > > + SVN_ERR(dump_props(eb, &(eb->dump_props_pending), TRUE, pool)); > > + > > + /* Dump the directory entries */ > > + for (hi = apr_hash_first(pool, db->deleted_entries); hi; > > + hi = apr_hash_next(hi)) { > > + const void *key; > > + const char *path; > > + apr_hash_this(hi, &key, NULL, NULL); > > + path = key; > > See svn__apr_hash_index_key(). TODO. > > + apr_array_header_t *compose_path = apr_array_make(pool, 2, > > + sizeof(const char *)); > > tabs Fixed. > > + /* If the parent directory has explicit comparison path and rev, > > s/comparison/copyfrom/ Fixed. > > + record the same for this one. */ > > + if (pb && ARE_VALID_COPY_ARGS(pb->cmp_path, pb->cmp_rev)) { > > + APR_ARRAY_PUSH(compose_path, const char *) = pb->cmp_path; > > + APR_ARRAY_PUSH(compose_path, const char *) = > > + svn_relpath_basename(path, pool); > > one more tab Fixed. > > + /* Write information about the filepath to hb->eb */ > > s/to hb->eb/from the handler baton to the edit baton/ Er, I did mean `hb->eb` literally (the editor baton in the handler baton). > > + /* Cleanup */ > > + SVN_ERR(svn_io_file_close(hb->temp_file, hb->pool)); > > As described above, you don't need to close the file, > closing the stream is enough. TODO. > > + /* Custom handler_baton allocated in a separate pool */ > > + apr_pool_t *handler_pool = svn_pool_create(pool); > > + struct handler_baton *hb = apr_pcalloc(handler_pool, sizeof(*hb)); > > + hb->pool = handler_pool; > > It sucks that the window handler does not get pool arguments, so > you have to stick a pool in the baton. But that isn't your fault. Exactly. > > + hb->eb = eb; > > + > > + /* Use a temporary file to measure the text-content-length */ > > + SVN_ERR(svn_io_temp_dir(&tempdir, hb->pool)); > > + > > + hb->temp_filepath = svn_dirent_join(tempdir, "XXXXXX", hb->pool); > > + apr_err = apr_file_mktemp(&(hb->temp_file), hb->temp_filepath, > > + APR_CREATE | APR_READ | APR_WRITE | APR_EXCL, > > + hb->pool); > > + if (apr_err != APR_SUCCESS) > > + SVN_ERR(svn_error_wrap_apr(apr_err, NULL)); > > You can replace the above chunk with a simple call to > svn_io_open_unique_file3(). TODO. If I recall correctly, someone else also suggested this on IRC, but there seems to be some issue with it: I'll check this later. > > + SVN_ERR(svn_io_file_open(&temp_file, eb->temp_filepath,APR_READ, > > + 0600,pool)); > > tabs again Fixed. > > + struct dump_edit_baton *eb = apr_pcalloc(pool, > > + sizeof(struct dump_edit_baton)); > > more tabs Fixed. > > +static int verbose = 0; > > +static apr_pool_t *pool = NULL; > > +static svn_client_ctx_t *ctx = NULL; > > You're only using the client context in open_connection. > Make it a local variable there? I was actually worried about lifetime issues. If ctx won't be read/ written after open_connection, this is okay. Otherwise, not. TODO. > > + struct replay_baton *replay_baton = apr_palloc(pool, > > + sizeof(struct replay_baton)); > > more tabs Fixed. > > + fprintf(out_stream, > > Use svn_cmdline_fprintf() Fixed. > > + "usage: svnrdump URL [-r LOWER[:UPPER]]\n\n" > > This string needs to be marked for localisation like this: _("my string") TODO. I'm missing some header: _ is undefined. > > + "Dump the contents of repository at remote URL to stdout in a 'dumpfile'\n" > > + "v3 portable format. Dump revisions LOWER rev through UPPER rev.\n" > > You don't need to mention the dumpfile format version in the help > string. Okay. I need to mention somewhere that svnrdump doesn't support undeltified dumps though, don't I? > > + "LOWER defaults to 1 and UPPER defaults to the highest possible revision\n" > > + "if omitted.\n"); > > + for (i = 1; i < argc; i++) { > > Please use svn_cmdline__getopt_init() and apr_getopt_long(). > See svnsync for an example. Ouch. Don't you think it's an overkill for the current svnrdump? There are no subcommands and just a few command-line arguments. > Please add a docstring. > > > +void > > +write_hash_to_stringbuf(apr_hash_t *properties, > > + svn_boolean_t deleted, > > + svn_stringbuf_t **strbuf, > > + apr_pool_t *pool); > > + Fixed. > Please add a docstring. > > > +svn_error_t * > > +dump_props(struct dump_edit_baton *eb, > > + svn_boolean_t *trigger_var, > > + svn_boolean_t dump_data_too, > > + apr_pool_t *pool); > > + > > +#endif Fixed. Doxygen-friendly docstrings are a TODO. > > +#include "svn_pools.h" > > +#include "svn_cmdline.h" > > +#include "svn_client.h" > > +#include "svn_ra.h" > > +#include "svn_repos.h" > > Are all these includes really needed? It seems only svn_repos.h is needed. Fixed. > > +void > > +write_hash_to_stringbuf(apr_hash_t *properties, > > + svn_boolean_t deleted, > > + svn_stringbuf_t **strbuf, > > + apr_pool_t *pool) > > +{ > > This function needs a docstring, too. Wait. I just need to write the docstrings once, right? In the header? > And is there no function that already does this somewhere in the svn > libraries or in APR? No. I copied this out from subversion/svnrdump/svnrdump.c and refactored it a little bit. You can see the changes I made after your review in the most recent couple of commits on my GitHub [1]. [1]: http://github.com/artagnon/svn-dump-fast-export/commits/svn-merge -- Ram -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html