Re: [PATCH v2] Add svnrdump

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Stefan,

Stefan Sperling writes:
> Review below.

First, thanks for the detailed review! I'll be travelling over the
next few days starting tomorrow, and didn't want to delay the response
to your review: I've marked some items "TODO" so that I can grep for
them when I'm back in India and fix them.

> This diff is needed to build svnrdump as part of svn on Unix:
> 
> Index: build.conf
> ===================================================================
> --- build.conf	(revision 963733)
> +++ build.conf	(working copy)
> @@ -167,6 +167,13 @@ libs = libsvn_wc libsvn_subr apriconv apr
>  install = bin
>  manpages = subversion/svnversion/svnversion.1
>  
> +[svnrdump]
> +description = Subversion remote repository dumper
> +type = exe
> +path = subversion/svnrdump
> +libs = libsvn_client libsvn_ra libvsvn_delta libsvn_subr apr
> +install = bin
> +
>  # Support for GNOME Keyring
>  [libsvn_auth_gnome_keyring]
>  description = Subversion GNOME Keyring Library

Thanks! Now included.

> Can you include the above bit in your diff, please, and create follow-up
> diffs relative to the root of a Subversion trunk working copy? Thanks.

Ah, I wasn't paying attention. `git diff` always produces diffs
relative to the root.

> Please also add a man page similar to the one of svnsync.
> Even though I don't like the fact that our man pages simply refer to
> the Subversion book rather than providing a small and useful subset of it,
> it's good to at least be consistent about it.

Fixed.

> > +struct dump_edit_baton {
> 
> Please add a comment here explaining what the stream is used for,
> for instance /* The output stream we write the dumpfile to. */
> 
> > +  svn_stream_t *stream;

Fixed.

> > +  svn_revnum_t current_rev;
> 
> This is only incremented by never used?

Fixed.

> > +
> > +  /* pool is for per-edit-session allocations */
> > +  apr_pool_t *pool;
> > +
> > +  /* Store the properties that changed */
> > +  apr_hash_t *properties;
> > +  apr_hash_t *del_properties; /* Value is always 0x1 */
> 
> Just say "value is undefined". Or use an apr_array_header_t.

Fixed.

> A comment here saying what propstring is for would be nice.
> 
> > +  svn_stringbuf_t *propstring;

Fixed.

> > +
> > +  /* Was a copy command issued? */
> > +  svn_boolean_t is_copy;
> 
> Copy of what and when? This baton is global for the entire edit...
> 
> Going through the code, I see that you're using this to indicate to
> dump_node() whether an add_directory() or add_file() was in fact a copy.
> Why not remove this field from the struct and add it as a parameter to
> dump_node instead?

TODO.

> > +
> > +  /* Path of changed file */
> > +  const char *changed_path;
> 
> The changed_path field seems to be unused.
> 
> According to comments in open_file() and add_file(), change_file_prop()
> and apply_textdelta() should be using this but they aren't.

TODO.

> > +  /* Temporary file to write delta to along with its checksum */
> > +  char *temp_filepath;
> 
> That's a poor variable name. What about delta_abspath?

Fixed.

> > +  svn_checksum_t *checksum;
> 
> And rename this to delta_checksum?

Actually, I figured this wasn't used and removed it.

> > +
> > +  /* Flags to trigger dumping props and text */
> > +  svn_boolean_t must_dump_props;
> > +  svn_boolean_t must_dump_text;
> 
> I'd call these dump_props and dump_text, but that's a matter of taste.

Fixed.

> > +  svn_boolean_t dump_props_pending;
> > +};
> > +
> > +struct dir_baton {
> > +  struct dump_edit_baton *eb;
> > +  struct dir_baton *parent_dir_baton;
> > +
> > +  /* is this directory a new addition to this revision? */
> > +  svn_boolean_t added;
> > +
> > +  /* has this directory been written to the output stream? */
> > +  svn_boolean_t written_out;
> > +
> > +  /* the absolute path to this directory */
> > +  const char *path;
> 
> In code written post-svn-1.6, we usually call absolute paths
> something_abspath. E.g. local_abspath is an absolute path in a
> filesystem on the client (but not in the repository).
> 
> Elsewhere, this is called 'full_path' which would be fine name
> for this field, too. (Though any full_path variable you see in svn
> most likely pre-dates the *_abspath convention.)

Renamed path to abspath.

> > +
> > +  /* the comparison path and revision of this directory.  if both of
> > +     these are valid, use them as a source against which to compare
> > +     the directory instead of the default comparison source of PATH in
> > +     the previous revision. */
> > +  const char *cmp_path;
> > +  svn_revnum_t cmp_rev;
> 
> These just seem to be used as regular copyfrom info, so let's name them
> as such: copyfrom_path and copyfrom_rev
> Then you can also shrink the comment above cause everyone knows what
> copyfrom info is: /* Copyfrom info for the node, if any. */

Fixed.

> > +
> > +  /* hash of paths that need to be deleted, though some -might- be
> > +     replaced.  maps const char * paths to this dir_baton.  (they're
> > +     full paths, because that's what the editor driver gives us.  but
> > +     really, they're all within this directory.) */
> > +  apr_hash_t *deleted_entries;
> 
> This is very well commented and well named.
> 
> > +
> > +  /* pool to be used for deleting the hash items */
> > +  apr_pool_t *pool;
> 
> Hmmm.. a pool does not delete anything. It provides storage.
> Why do you need this?

TODO.

> > +};
> > +
> > +struct handler_baton
> > +{
> > +  svn_txdelta_window_handler_t apply_handler;
> > +  void *apply_baton;
> > +  apr_pool_t *pool;
> 
> Yet another pool. What's it for?

See window_handler below :)

> > +
> > +  /* Information about the path of the tempoarary file used */
> 
> s/tempoarary/temporary/

Fixed.

> > +  char *temp_filepath;
> > +  apr_file_t *temp_file;
> > +  svn_stream_t *temp_filestream;
> 
> What's the temporary file used for? You're writing a delta to it,
> so maybe name it accordingly?

Fixed.

> You need the file name to stat it in close_file().
> You need the stream in the baton to write to the file.
> But you don't need the apr_file.
> Open the file. Then wrap it in the stream with disown=FALSE, and pass
> just the stream to the window handler via the baton. When the stream
> is closed, the file will be closed as well.

TODO. I want to validate and make sure that I don't break anything
else or leak memory before performing this change.

> > +
> > +  /* To fill in the edit baton fields */
> > +  struct dump_edit_baton *eb;
> 
> Just say /* Global edit baton. */ or even drop the comment.

Fixed.

> > +};
> > +
> 
> Needs a docstring.

Fixed. I noticed one more mistake: to_rev is unused. Looks like
there's a LOT of historical crufts that I didn't clean up. I wonder
why the compiler doesn't tell me about all this.

> > +/* Make a directory baton to represent the directory was path
> > +   (relative to EDIT_BATON's path) is PATH.
> 
> The above sentence doesn't parse.
> 
> > +
> > +   CMP_PATH/CMP_REV are the path/revision against which this directory
> > +   should be compared for changes.  If either is omitted (NULL for the
> > +   path, SVN_INVALID_REVNUM for the rev), just compare this directory
> > +   PATH against itself in the previous revision.
> 
> s/CMP/COPYFROM/ and tweak the docstring to say something like:
>   If the copyfrom information is valid, the directory will be compared
>   against its copy source. Else, it will be compared against itself in
>   the previous revision.

Another historical cruft: since svnrdump doesn't support non-deltified
dumps and doesn't have fs backing, I can't and don't ever (need to)
compare it against itself in the previous revision.

> > +   PARENT_DIR_BATON is the directory baton of this directory's parent,
> > +   or NULL if this is the top-level directory of the edit.  ADDED
> > +   indicated if this directory is newly added in this revision.
> 
> s/indicated/indicates/

Fixed.

> > +  apr_array_header_t *compose_path = apr_array_make(pool, 2,
> > +						    sizeof(const char *));
> 
> The above line contains tabs, please replace with spaces.
> And make sure to align function arguments like this (not sure if
> they appeared aligned in your editor or not):

Fixed.

> > +  if (pb) {
> 
> I've told you this on IRC before, but just for sake of completeness:
> Virtually all if blocks and loops in this patch have a "wrong" style
> of indentation.
> See
> http://subversion.apache.org/docs/community-guide/conventions.html#coding-style
> 
> (I personally prefer the indentation style you're using,
> but the project convention has been set looooong ago -- such is life.)

Ah, I missed this earlier- you have an svn-dev.el: I'll use it to fix
everything before re-submitting.

> > +    APR_ARRAY_PUSH(compose_path, const char *) = "/";
> > +    APR_ARRAY_PUSH(compose_path, const char *) = path;
> > +    full_path = svn_path_compose(compose_path, pool);
> 
> See svn_dirent_join_many().

Fixed.

> > +  }
> > +  else
> > +    full_path = apr_pstrdup(pool, "/");
> 
> Why allocate "/" in a pool? This can be static string unless you
> intend to write to it.

Frankly, working with APR pools was quite a nightmare for me- after
observing many cases of leaks and crashes, I jotted down some notes
about using them and I made it a point to follow them strictly. This
alloc adheres to those notes. I'll submit those notes to dev@ once
I've polished them- new devs will probably find it useful.

> > +/*
> > + * Write out a node record for PATH of type KIND under EB->FS_ROOT.
> > + * ACTION describes what is happening to the node (see enum svn_node_action).
> > + * Write record to writable EB->STREAM, using EB->BUFFER to write in chunks.
> > + *
> > + * If the node was itself copied, IS_COPY is TRUE and the
> > + * path/revision of the copy source are in CMP_PATH/CMP_REV.  If
> > + * IS_COPY is FALSE, yet CMP_PATH/CMP_REV are valid, this node is part
> > + * of a copied subtree.
> 
> Again, s/CMP/COPYFROM/

Fixed.

> > + */
> > +static svn_error_t *
> > +dump_node(struct dump_edit_baton *eb,
> > +          const char *path,    /* an absolute path. */
> > +          svn_node_kind_t kind,
> > +          enum svn_node_action action,
> > +          const char *cmp_path,
> > +          svn_revnum_t cmp_rev,
> > +          apr_pool_t *pool)
> > +{
> > +  /* Write out metadata headers for this file node. */
> 
> The node might as well be a directory, so the above comment is misleading.

Fixed.

> > +  /* Remove leading slashes from copyfrom paths. */
> > +  if (cmp_path)
> > +    cmp_path = ((*cmp_path == '/') ? cmp_path + 1 : cmp_path);
> 
> What if the copyfrom path is "/"?
> (If memory serves me right we've had a bug like this before somewhere...)

Right, I copied this out from somewhere I think. Fixed with:
if (copyfrom_path && strcmp(copyfrom_path, "/"))

> > +
> > +  switch (action) {
> > +    /* Appropriately handle the four svn_node_action actions */
> 
> Nuke the above comment. Don't put numbers that may change some day
> into comments.

Okay. Fixed.

> > +    if (!eb->is_copy) {
> > +      /* eb->dump_props_pending for files is handled in
> > +         close_file which is called immediately.
> > +         However, directories are not closed until
> > +         all the work inside them have been done;
> 
> s/have been/has been/

Fixed.

> > +         eb->dump_props_pending for directories is
> > +         handled in all the functions that can
> > +         possibly be called after add_directory:
> > +         add_directory, open_directory,
> > +         delete_entry, close_directory, add_file,
> > +         open_file and change_dir_prop;
> > +         change_dir_prop is a special case
> > +         ofcourse */
> 
> Please re-format the above using longer lines (up to column 78).

Fixed.

> > +static svn_error_t *open_root(void *edit_baton,
> > +			      svn_revnum_t base_revision,
> > +			      apr_pool_t *pool,
> > +			      void **root_baton)
> 
> tabs in above 3 lines

Fixed.

> > +{
> > +  /* Allocate a special pool for the edit_baton to avoid pool
> > +     lifetime issues */
> 
> I think you don't need this comment because this is already
> sort of documented in the docstring for open_root() in svn_delta.h.

It took me a while to grasp this even after reading the documentation
of open_root, and I think others will find it useful. Didn't remove
comment.

> > +  if (val)
> > +    /* Delete the path, it's now been dumped */
> > +    apr_hash_set(pb->deleted_entries, path, APR_HASH_KEY_STRING, NULL);
> 
> You don't need to set the value to NULL in the hash table.
> Doing so won't save any memory. I've say just remove the above 3 lines.

Oh, I'm not doing it to save memory. Although I'm not sure if I still
need it in my logic, this definitely makes debugging nicer.

> tabs again in the above line

Fixed.

> > +    APR_ARRAY_PUSH(compose_path, const char *) =
> > +	    svn_relpath_basename(path, pool);
> 
> and here is another tab 

Fixed.

> > +    cmp_path = svn_path_compose(compose_path, pool);
> 
> Again, see svn_dirent_join_many().
> If you need a svn_relpath_join_many() for some reason please write one.

Fixed. Yes, svn_relpath_join_many sounds like a good idea: I'll mark
that as a TODO for now.

> > +static svn_error_t *
> > +close_directory(void *dir_baton,
> > +                apr_pool_t *pool)
> > +{
> > +  struct dir_baton *db = dir_baton;
> > +  struct dump_edit_baton *eb = db->eb;
> > +  apr_hash_index_t *hi;
> > +  apr_pool_t *subpool = svn_pool_create(pool);
> 
> Please call this iterpool, not subpool.
> You're using it in a loop (so we prefer "iteration pool").

Right. Fixed.

> > +
> > +  /* Some pending properties to dump? */
> > +  SVN_ERR(dump_props(eb, &(eb->dump_props_pending), TRUE, pool));
> > +
> > +  /* Dump the directory entries */
> > +  for (hi = apr_hash_first(pool, db->deleted_entries); hi;
> > +       hi = apr_hash_next(hi)) {
> > +    const void *key;
> > +    const char *path;
> > +    apr_hash_this(hi, &key, NULL, NULL);
> > +    path = key;
> 
> See svn__apr_hash_index_key().

TODO.

> > +  apr_array_header_t *compose_path = apr_array_make(pool, 2,
> > +						    sizeof(const char *));
> 
> tabs

Fixed.

> > +  /* If the parent directory has explicit comparison path and rev,
> 
> s/comparison/copyfrom/

Fixed.

> > +     record the same for this one. */
> > +  if (pb && ARE_VALID_COPY_ARGS(pb->cmp_path, pb->cmp_rev)) {
> > +    APR_ARRAY_PUSH(compose_path, const char *) = pb->cmp_path;
> > +    APR_ARRAY_PUSH(compose_path, const char *) =
> > +	    svn_relpath_basename(path, pool);
> 
> one more tab

Fixed.

> > +  /* Write information about the filepath to hb->eb */
> 
> s/to hb->eb/from the handler baton to the edit baton/

Er, I did mean `hb->eb` literally (the editor baton in the handler
baton).

> > +  /* Cleanup */
> > +  SVN_ERR(svn_io_file_close(hb->temp_file, hb->pool));
> 
> As described above, you don't need to close the file,
> closing the stream is enough.

TODO.

> > +  /* Custom handler_baton allocated in a separate pool */
> > +  apr_pool_t *handler_pool = svn_pool_create(pool);
> > +  struct handler_baton *hb = apr_pcalloc(handler_pool, sizeof(*hb));
> > +  hb->pool = handler_pool;
> 
> It sucks that the window handler does not get pool arguments, so
> you have to stick a pool in the baton. But that isn't your fault.

Exactly.

> > +  hb->eb = eb;
> > +
> > +  /* Use a temporary file to measure the text-content-length */
> > +  SVN_ERR(svn_io_temp_dir(&tempdir, hb->pool));
> > +
> > +  hb->temp_filepath = svn_dirent_join(tempdir, "XXXXXX", hb->pool);
> > +  apr_err = apr_file_mktemp(&(hb->temp_file), hb->temp_filepath,
> > +          APR_CREATE | APR_READ | APR_WRITE | APR_EXCL,
> > +          hb->pool);
> > +  if (apr_err != APR_SUCCESS)
> > +    SVN_ERR(svn_error_wrap_apr(apr_err, NULL));
> 
> You can replace the above chunk with a simple call to
> svn_io_open_unique_file3().

TODO. If I recall correctly, someone else also suggested this on IRC,
but there seems to be some issue with it: I'll check this later.

> > +    SVN_ERR(svn_io_file_open(&temp_file, eb->temp_filepath,APR_READ,
> > +			     0600,pool));
> 
> tabs again

Fixed.

> > +  struct dump_edit_baton *eb = apr_pcalloc(pool,
> > +					   sizeof(struct dump_edit_baton));
> 
> more tabs

Fixed.

> > +static int verbose = 0;
> > +static apr_pool_t *pool = NULL;
> > +static svn_client_ctx_t *ctx = NULL;
> 
> You're only using the client context in open_connection.
> Make it a local variable there?

I was actually worried about lifetime issues. If ctx won't be read/
written after open_connection, this is okay. Otherwise, not. TODO.

> > +  struct replay_baton *replay_baton = apr_palloc(pool,
> > +						 sizeof(struct replay_baton));
> 
> more tabs

Fixed.

> > +  fprintf(out_stream,
> 
> Use svn_cmdline_fprintf()

Fixed.

> > +    "usage: svnrdump URL [-r LOWER[:UPPER]]\n\n"
> 
> This string needs to be marked for localisation like this: _("my string")

TODO. I'm missing some header: _ is undefined.

> > +    "Dump the contents of repository at remote URL to stdout in a 'dumpfile'\n"
> > +    "v3 portable format.  Dump revisions LOWER rev through UPPER rev.\n"
> 
> You don't need to mention the dumpfile format version in the help
> string.

Okay. I need to mention somewhere that svnrdump doesn't support
undeltified dumps though, don't I?

> > +    "LOWER defaults to 1 and UPPER defaults to the highest possible revision\n"
> > +    "if omitted.\n");
> > +  for (i = 1; i < argc; i++) {
> 
> Please use svn_cmdline__getopt_init() and apr_getopt_long().
> See svnsync for an example.

Ouch. Don't you think it's an overkill for the current svnrdump? There
are no subcommands and just a few command-line arguments.

> Please add a docstring.
> 
> > +void
> > +write_hash_to_stringbuf(apr_hash_t *properties,
> > +                        svn_boolean_t deleted,
> > +                        svn_stringbuf_t **strbuf,
> > +                        apr_pool_t *pool);
> > +

Fixed.

> Please add a docstring.
> 
> > +svn_error_t *
> > +dump_props(struct dump_edit_baton *eb,
> > +           svn_boolean_t *trigger_var,
> > +           svn_boolean_t dump_data_too,
> > +           apr_pool_t *pool);
> > +
> > +#endif

Fixed. Doxygen-friendly docstrings are a TODO.

> > +#include "svn_pools.h"
> > +#include "svn_cmdline.h"
> > +#include "svn_client.h"
> > +#include "svn_ra.h"
> > +#include "svn_repos.h"
> 
> Are all these includes really needed?

It seems only svn_repos.h is needed. Fixed.

> > +void
> > +write_hash_to_stringbuf(apr_hash_t *properties,
> > +                        svn_boolean_t deleted,
> > +                        svn_stringbuf_t **strbuf,
> > +                        apr_pool_t *pool)
> > +{
> 
> This function needs a docstring, too.

Wait. I just need to write the docstrings once, right? In the header?

> And is there no function that already does this somewhere in the svn
> libraries or in APR?

No. I copied this out from subversion/svnrdump/svnrdump.c and
refactored it a little bit.

You can see the changes I made after your review in the most recent
couple of commits on my GitHub [1].

[1]: http://github.com/artagnon/svn-dump-fast-export/commits/svn-merge

-- Ram
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]