Re: [PATCH 2/3] add new Git::Repo API

Lea Wiemann <lewiemann@xxxxxxxxx> · Fri, 18 Jul 2008 15:40:01 +0200

Jakub Narebski wrote:
> [$commit->author:] We would probably want _in the future_ to  return some
> object wrapper, which stringifies to value of author and committer headers

Yup, good idea.  They'll even stay strings, they'll just be blessed.

>>>> +=item $commit->message
> 
> I'd rather then have _git_ convert it to UTF=8 for us (using 
> --encoding=<encoding> option to git-log/git-rev-list)

Yeah, I guess the API should actually decode it.  You wouldn't want to
have the message in UTF-8 but in Unicode (I suggest you read man
perlunitut if you haven't done so).  We cannot have git do the decoding,
since (apart from the fact that it doesn't smell right) it isn't
guaranteed to emit valid UTF-8 (thanks Junio for the pointer):

Lea Wiemann: Does anyone know off the top of their heads how git handles
    character decoding errors in commands like git log? [...]
Junio 'gitster' Hamano: silently punt and show the original unmolested.
Junio 'gitster' Hamano: cf. pretty.c:pretty_print_commit()

So we're not guaranteed to be able to, in turn, decode git's output into
actual characters since it might just be byte soup.

Hence, how about this fallback strategy:

1. Decode according to the encoding header.
2. Decode as UTF-8 (passing through byte soup is often equivalent to
decoding UTF-8 since many terminals use UTF-8, and trying UTF-8 is
reasonably safe).
3. Decode as Latin1.

(Not that the fallbacks will matter a lot in practice, I think.)

> It is (much) better than forking git-cat-file for each commit shown
> on the list; nevertheless I think that it would be better to use git-log
> to generate list (or Git::Revlist) of Git::Commit objects.  It is one
> fork less, but what more important you don't have to access repository
> twice for the very same objects.

You're confused; it's not one fork less, it's a write to a pipe less.
(Pleeeease look at the code before you write something.  It's there, in
this very thread.)  And I don't believe the "access the repository
twice" thing is anywhere near an actual issue.  To summarize, you're
asking me to (a) write code and (presumably) (b) add something to the
interface of a public API, based on some (most probably faulty)
assumptions about performance?  You should really read
<http://c2.com/cgi/wiki?PrematureOptimization>.

> if I understand correctly for log-like views you 
> propose to first run simple git-rev-list [...], then feed list of
> revisions (perhaps via cache, i.e. excluding objects which are in
> cache) to 'git cat-file --batch' open two-directional pipeline.

Yup, it's an option, though currently it's a single cached call to git
log (or git rev-list).

> What I propose instead is to provide alternate method to fully 
> instantiate Git::Commit object (in addition to ->_load), which would 
> fill fields by parsing git-log / git-rev-list --headers output

Yes, but this would need a method in the API, it's not an optimization
that falls out for free.  Cluttering an API for some obscure (= very
doubtful) optimization?  Bad Idea.(tm)

> "git cat-file --batch" should have commits to be
> accessed in filesystem cache, which means in memory; but it is possible
> that they wouldn't be in cache because of I/O pressure

No.  Page cache turnover time is at least around 10 seconds (and that's
under fairly artificial conditions), definitely not in the millisecond
range.

> I think that _not using_ Git::Cmd (or somesuch) API results in botched,
> horrible API
>   our $git_version = $repo_root->repo(directory => 'dummy')->version;
> (Unless it is not needed any longer, or not used any longer; if it is
> so, then perhaps implementing Git::Cmd as generic wrapper around git
> commands, hiding for example ActivePerl hack, could be left for later).

It isn't used any longer -- I really suggest you read the whole thread
before replying. ;-)

>> As I wrote in my reply to Petr [...]
> 
> Just a question: was this reply only to him, or to all?

To all, otherwise I wouldn't have Cc'ed the list.

>> I wouldn't -- see my blurb about error handling at the top of my reply
>> to Petr (<487BD0F3.2060508@xxxxxxxxx>).  You're not supposed to pass
>> anything that you didn't get from get_sha1 into Git::Commit or
>> Git::Tag constructors, or your error handling is invariably broken.
> 
> I can understand this simpler, although less than optimal, and geared
> mainly towards gitweb needs.

FTR, yes it is simpler, but no, it is not really geared toward gitweb
needs, and it's definitely not "less than optimal" in the sense of being
worse than the exception-based error handling Git.pm does.  Trust on me
on this one. ;-)

> Errr... "yes" to first question means that 'reuse' option makes sense
> _only_ for get_bidi_pipe? If so, why it is present in other commands?

Yes, and no, it isn't present in other commands.  (Hey, could you please
check the code before posting?  Really.)

>>> Why name_rev, and no describe?
>> Feel free to add it. ;-)  (It might take some work to come up with a
>> decent interface for that method.)
> 
> Why do you _need_ name_rev, if you are not to include git-describe
> equivalent.

I needed it for gitweb.  As I said, I'm not trying to create a complete
API.  A describe_rev (or so) method can be added later, if and when it's
needed.  (As I said, I don't think writing APIs without at least one use
case is a good idea anyway.)

>>> Should (for completeness) Git::Tag provide $tag->validate() method?
> 
> I meant here equivalent of "git tag -v <tag>"

I guess it could be added.  As with describe_rev, I won't add it myself,
in particular not as part of this patch series.

-- Lea
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html