Re: [PATCH 2/3 v2] add new Perl API: Git::Repo, Git::Commit, Git::Tag, and Git::RepoRoot

Lea Wiemann <lewiemann@xxxxxxxxx> · Tue, 19 Aug 2008 15:51:03 +0200

Since I didn't mention it: The patch series applies on next.

Lea Wiemann wrote:
> +Git::Repo - Read-only access to the Git repositories.
> +
> +Error handling is simple: On a consistent repository, the Perl
> +interface will never die.  You can use the get_sha1 method to resolve
> +arbitrary object names or check the existence of SHA1 hashes; get_sha1
> +will return undef if the object does not exist in the repository.  Any
> +SHA1 that is returned by get_sha1 can be safely passed to the other
> +Git::Repo methods.

Here's some elaboration on the rationale behind the error handling.  As
a reminder, what we do is force developers to resolve object identifiers
(such as HEAD^) into SHA1s first, rather than allowing them to pass in
arbitrary object identifiers into functions.  Here's why:

a) There's really just one point where errors can occur: at the input
boundary (like the command line).  Hence, you usually need one to three
get_sha1 calls to resolve your input object names, and the rest of your
program will be error-handling free with regard to Git::Repo (that is,
if it dies it's either a bug or an error in the repository structure).

On the other hand, if you don't have such an explicit error-checking
boundary at the beginning of your program (where you resolve all
identifiers), you basically allow invalid object identifiers to "creep"
into your code.  For instance, gitweb oftentimes would have statements
like "or die 'ls-tree failed'" or "or die 'commit not found'" deep
inside a function -- in some cases, I found out that these failures
*could* not even happen since the objects were guaranteed to exist by
earlier calls (and hence it was basically dead code); and in many cases
the error messages were simply non-descript -- which brings me to the
next point:

b) Error reporting is really hard to implement: For instance, if
diff-tree returns non-zero, then unless you scrape its STDERR, you can't
tell which of the two to-be-diffed objects didn't exist (or had the
wrong type), and hence you're oftentimes stuck with a generic 'diff-tree
failed' message.  In other words, if a simple diff-tree call goes wrong,
there are three possible causes: (1) The left object is invalid, (2) the
right object is invalid, (3) something fatal happened (bug or repository
breakage).  Distinguishing these cases is hard, and moving the
object-resolving code to the beginning of the API user's program means
that diff-tree failure can only indicate case (3).

c) The error messages you could get from an API are not usually what you
want anyway.  So if you write

     my $diff = diff_tree($obj_identifier_1, $obj_identifier_2)

and expect it to die with a descriptive error message if one of the two
identifiers doesn't point to a valid tree object, the best your error
message can possibly be is "git diff-tree: HEAD^:foo/bar is not a valid
tree object".  Which leaves users puzzled because (1) they didn't call
diff-tree, and (2) the program might have constructed the
"HEAD^:foo/bar" string, and they only passed in parts of it, so it's not
clear to them where "HEAD^:foo/bar" comes from.

d) Last but not least, using exceptions to communicate errors is against
Perl coding conventions -- I would even have to look up the syntax to
check for a specific exception type after an 'eval' block, because I
simply never needed it.

So I think those four reasons really prevail over the extra work of
having to make 1-3 get_sha1 calls at the beginning of your program (and
providing proper error messages if they fail).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html