Re: [PATCH 2/3] add new Git::Repo API

Lea Wiemann <lewiemann@xxxxxxxxx> · Tue, 15 Jul 2008 00:19:31 +0200

Petr Baudis wrote:
> Maybe I sound too mentoring at places; [OTOH] if something general gets into
> the Git tree, I'd like to make sure it's something we can live happily
> with for long time, not just a hack tailored for gitweb caching.

Yup, I agree (and you don't sound too mentoring ^^).  Thanks for the
review and feedback.

> I really miss some more detailed writeup on the envisioned design here. [...]
> Most importantly, how is Git::Repo interacting with working copies,

Git::Repo is not interacting with working copies at all right now.  Is
there anything you think should be considered for its design?

Here's a write-up about the design (I'll probably move this into
Git::Repo's man page):

----------
Git::Repo aims to provide low-level access to Git repositories.  For
instance, you can resolve object names (like "HEAD~2") to SHA1s, and
inspect objects.  It does not attempt to be a wrapper around the git
plumbing or porcelain commands.

Error handling is simple: On a consistent repository, the Perl interface
will never die.  You can use the get_sha1 method to resolve arbitrary
object names or check the existence of SHA1 hashes; get_sha1 will return
undef if the object does not exist in the repository.  Any SHA1 that is
returned by get_sha1 can be safely passed to the other Git::Repo methods.
----------

> First, I don't think it's good idea at all to put the pipe-related stuff
> to Git::Repo - this is botched up API just like the current one.

Well, they're more like helper methods.  Since they don't fit into the
design goals of the Git::Repo API at all, I'd suggest we just
underscore-prefix them and take them out of the man page.  (The only
reason why I hadn't done this is that gitweb uses $repo->cmd_output
extensively, so it'd end up with a lot of underscore calls.  But I
suppose we can either alias _cmd_output to cmd_output in gitweb's
CachedRepo subclass, or live with $repo->_cmd_output calls.)  Does
underscore-prefixing sound good to you?

If someone wants to come up with a consistent nice interface for calling
git commands, sure.  I wasn't actually trying to do that.

> Here is an idea: Introduce Git::Command object [and a Git::CommandFactory]
> 
> 	my $c = Git::Command->new(['git', '--git-dir=.', 'cat-file', \
> 		'-p', 'bla'], {pipe_out=>1})    [...]
> 	my $cf = Git::CommandFactory->new('git', '--git-dir=.');
> 
> Or am I overdoing it?

Yes, I think so. ;-)  All we're talking about here is a wrapper around
"open '-|'" calls (plus some workarounds for Windows I suppose).

I don't have much of a notion of a 'command' as an object in my head;
your (snipped) example makes it look like you're trying to create a
IO::Handle-compatible interface, which I think is way too much effort
(and error-prone) for simple pipes.  Also, a CommandFactory class just
to catenate lists together seems like overkill to me.

Something like a command interface *may* make sense if it's tied to
repositories or working copies, in which case it could automatically set
--git-dir or --work-tree, but it's beyond the scope of what I'm trying
to create here, and I don't think it's even overly useful.

> Instead, I believe the best course is to gradually translate all the
> Git.pm functionality to the new OO model, leaving Git.pm as a
> compatibility wrapper. Now, if you believe this is a non-trivial task,
> please tell us why.

Well, as I said, the fact that there are untested parts in Git.pm
doesn't exactly make it trivial to refactor.  Also, only very few parts
can be cleanly moved to Git::Repo.  In "grep '^sub [^_]' Git.pm" I find
only cat_blob and perhaps hash_object to be eligible to be moved (though
hash_object should probably live in a working-copy/non-bare-repo class,
with an optional insert => 1 parameter).  And even cat_blob is
non-trivial to move unless you want the whole blob to be read into memory.

That's a lot of non-trivialness for very little gain.  I doubt I'd even
have enough time till the end of GSoC (minus vacation) to do this.

> It should be actually very easy to start with moving all the pipe
> functionality to Git::Command.

Creating a new (Git::Command) API is very much non-trivial, apart from
the fact that I'm not convinced that we need Git::Command, and that a
clean command interface neither falls out of Git.pm nor Git::Repo.

>>   I'm working on caching for gitweb, not on implementing the
>>   next great Perl API for Git.  (And Git::Repo isn't great, FTR.)
> 
> Wait, I can't make sense out of this paragraph. If Git.pm sucks, we can
> work on new API. But we better _make_ it great. Or someone else comes by
> next year and says "oh, but it's buggy and needs refactoring, let's
> throw it away and redesign it!"

Sorry, I wasn't clear with my parenthesed remark: I actually think that
Git::Repo is pretty good in terms of code and interface quality.  It's
just not *complete*, even in its limited scope, and I'm not attempting
to make it complete.

I do think that someone who wants to extend Git::Repo (like Jakub with
Git::Config) won't have much trouble doing so with the existing design.

>> +use constant _MESSAGE => 'M';
>> +use constant _ENCODING => 'E';   [snip]
> 
> if you are going to use hashes
> anyway, why not actually key by sensible name directly?

Embarrassingly premature optimization here. ^_^  I'll fix it.

>> $commit = Git::Commit->new($repo, $sha1)
> 
> Are we sure we don't want hash-based arguments instead? This is badly
> extensible and inconsistent with the rest of the API.

*ponders*  Every commit needs a repo and a SHA1, so those will never get
optional.  We can always add hash-based options after the two mandatory
arguments, but I don't even see any such possible options at the moment.
 (And if I turn out to be completely wrong, we can even move to a
hash-only argument list by checking the type of the first parameter.)
Really, I wouldn't worry.

>> [Git::Commit->new, Git::Tag->new:]
>> +Calls to this method are free, since it does not check whether $sha1
>> +exists and has the right type.  However, accessing any of the commit
>> +object's properties will fail if $sha1 is not a valid commit object.
> 
> This is nice idea, but I'd also provide a well-defined way for the user
> to verify the object's validity at a good moment; basically, make load()
> a public method. The user can deal with errors then and rely on
> error-free behavior later.

No, you should never pass in an invalid SHA1 in the first place.  The
above piece of documentation is just a warning that bugs will show up
delayed.  IOW, this is not the right place to have your error handling.

If you're getting a SHA1 through the user-interface, check its existence
with get_sha1 before passing it to the constructor.

>> +Note that $sha1 must be the SHA1 of a commit object; tag objects are
>> +not dereferenced.
> 
> Why not?

Because the SHA1 might resolve to an object of the wrong type, which
means you have to do error handling in Git::Object objects; that's the
wrong place.

If tag-resolving is really needed, we can add an optional $type
parameter to get_sha1, which will cause get_sha1 to resolve the object
until a $type object is found, or return undef if the object is or
resolves to an object of the wrong type.

I have resolving code in gitweb's git_get_sha1_or_die (which I didn't
implement in Git::Repo since it uses some customized error reporting).
The resolving code could conceivably be extracted and moved to get_sha1.
 I think there are a few things to ponder and maybe discuss, so I'd do
that in a separate patch (if I get around it before the end of the project).

>> +=item $commit->authors
> 
> author

Fixed.

>> +Objects are loaded lazily, and hence instantiation is free.  Objects
>> +stringify to their SHA1s.
> 
> Maybe use the term 'Git database objects'? This way, it seems as if we
> are talking about all Git/*.pm objects.

I've replaced it with: "Objects are loaded lazily, and hence
instantiation is free.  Git::Object instances stringify to their SHA1s."

>> +sub sha1 {
>> +sub stringify {
> 
> Why not overload "" directly to sha1()?

Done (and removed stringify).

>> +sub assert_opts {
>> +sub assert_sha1 {
> 
> Pretend names with underscores, since they are internal?

Done, and removed them from @EXPORT_OK.

>> 'directory': The directory of the repository (mandatory).
> 
> I don't think making this mandatory is reasonable, since all the git
> commands can automatically figure this out by themselves too; so can
> Git::Repo easily by calling git rev-parse --git-dir.

Sure, it can be made non-mandatory if it's needed, but there are so many
possibilities for the exact time and place at which the repo directory
should be resolved using rev-parse (if at all) that I'd rather leave
this to the person who has an actual use-case for it.  I'm not a fan of
designing APIs before they are needed.

>> [Snipped a lot of quoting --LW]
>> +=item $repo->repo_dir
>> +=item $repo->git_binary
>> +=item $repo->version
>> +sub _get_git_cmd {
> 
> This definitely does not belong to a Git::Repo object.

Which of those methods are you referring to?  I think $repo->version
might reasonably be removed (and the code re-added to gitweb); I'll do
so unless you object.  _get_git_cmd is already underscored, and repo_dir
and git_binary only access attributes passed in through the constructor,
so I think those three should stay.

>> +=item $repo->cat_file($sha1)
> 
> I don't think this is good combination of semantic and name. Since we
> don't do the same thing as plain git cat-file, we might as well call it
> cat_object() or even better get_object().

Yup; I like get_object (I think I was planning to rename it and then
didn't remember doing so before sending off the patch).  Will rename it.

>> +=item $repo->get_path($tree_sha1, $file_sha1)
> 
> Now we are quickly getting messy again. This should definitely live in
> Git::Tree.

Yup, that's true.  I'll move it into gitweb until we have Git::Tree
(with a comment that it can be moved to Git::Tree once it exists).

>> +=item $repo->get_refs
>> +=item $repo->get_refs($pattern)
> 
> Again, the refs should be properly integrated into the object structure.

Really?  I think it's generally fine for get_refs to exist and to live
in Git::Repo.

Its return value (currently an an arrayref of [$sha1, $object_type,
$ref_name] arrayrefs) might need improvement though, and I find the
$pattern parameter pretty suspect (in that it smells like a for-each-ref
wrapper).  Since get_refs is unused at the moment (gitweb ended up
needing the slightly different show-ref), I'll remove it for now.  (Same
thing about me not being a fan of premature API design applies.)

I keep patches of everything I remove so other people will be able to
use them as starting points; I'll probably post them as "FYI"-patches to
the list at the end of my project, to preserve them for posterity.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html