[RFC] gitweb TODO

Jakub Narebski <jnareb@xxxxxxxxx> · Fri, 17 Nov 2006 19:01:40 +0100

These are a few gitweb issues and features I'm currently working on 
(or plan working on).

1. New patchset view (commitdiff, blobdiff)

In "old" gitweb commitdiff view was generated by iterating over lines of 
git-diff-tree raw format output, and generating diffs using 
git-cat-file and external diff utility (/usr/bin/diff). This required 
having temporary directory for diff generation, and of course diffs 
didn't have extended git headers.

The "new" commitdiff view is generated from single git-diff-tree 
--raw-with-path output. But I have made incorrect assumption that one 
line from "raw" diff-tree output always corresponds to only one patch 
in the patchset part of output. This is not the case. I'm not sure if 
those are the only cases when patch is broken, but changing file into 
symlink or symlink into file ('T' status), and explicit breaking ('B' 
status) generates two patches to one line of raw difftree output. The 
second is not of much importance for gitweb, unless yoy add -B to 
@diff_opts, but the first is important; it is currently broken, see
   http://tinyurl.com/y3cfop
(commit 4c52c0d31f0f7142d81a465c40789befc2e86548 on 
gitweb-test-funny-char branch in git.git repository).

I have thought of the following (mutually exclusive) ways to fix this

 a. Change core git git-diff-tree command to not break (some?) of T
    changes into two patches. From what gitster said on #git this 
    feature is for git-diff patches to be patch(1) compatibile; but
    -M already causes patches to be incompatibile with patch. I'm
    thinking here about adding some kind of -s/--single-patch
    --do-not-break-patch-into-two-please command line option for
    git-diff

 b. Check the raw difftree line for status and perhaps other info
    to know if the line generates more than one patch. It needs detailed
    knowledge about _when_ git-diff generates more than one patch to one
    "raw" format line, and would break if core diff changes in that
    detail. Simplest to implement, I think...

    Could you tell me all the cases when git generates more than one
    patch for one "raw" diff format line, please?

 c. "Cache" git diff header, or the whole patch, or the whole patchset.
    It is enough to cache (write lines to "buffer"/"cache" array) up to
    the extended header "^index" line, which can be used to check if to
    go to the next dofftree "raw" line (or wven which of "raw" difftree
    lines this particular patch corresponds to). Does not require
    changes in diff core, and is less fragile, less susceptible to
    breakage.

Which of those would be the best to implement?

2. Difftree combined diff gitweb "raw" format

Currently "commitdiff" view consist of the gitweb representation of 
"raw" git-diff output (list of changed files a la git-whatchanged), and 
the patchset (a la git-show). "commit" view has only list of changed 
files, nearly exacttly the same as in "commitdiff" view (but with links 
to blobdiff view instead of links to appropriate patch in "commitdiff" 
view).

I have though about using one of the combined diff outputs for merge 
commits. The problem is how to represent the whatchanged part. Which 
parts of gitweb difftree output to leave? And what about the fact that 
we have raw output for -c/--combined diff format, but not for chunk 
simplifying --cc (compact combined) output?

3. Committags support (and implementation)

There was some proposed implementation here, both by me and by Junio, 
but no definite patches were accepted. We have the following mutually 
exclusive ways to do this:

 a. Do esc_html first, then do all committags simultaneously. The
    advantage is that it is perhaps slightly faster, the disadvantages
    is additional complication in code, and the fact that regexp
    defining committags have to be either on esc_html-ed pattern, or be
    converted to esc_html-ed input.

 b. Have a chain of committags, and do committags sequentially. This
    means that we have to divide output into part to be further parsed,
    and the part which should not be parsed further; here Junio proposed
    wondefull idea of list mixed of strings (to be committags parsed)
    and string references (to be left as is). As the last "committag"
    we perform esc_html (uless for example some project stores commit
    messages in HTML, XML, or some structured text like AsciiDoc or
    reStructuredText).

Which of those would be better to implement?

4. Feeds (RSS, Atom,...)

There is new Atom format feed, there was request for per-branch feeds.
The feed output certainly needs cleanup. The questions are: 

a.) What should be in the feed (only commit message + authorship, or 
also whatchanged; what format use for whatchanged; do add information 
about tags created with given window of time, etc.)? 

b.) Should we use "text/xml" or (unstandarized) "application/rss+xml" as 
type/Content-Type for RSS 2.01 format, should we use 
"application/atom+xml" for Atom format?

c.) Should we add APP (Atom Publishing Protocol) support in addition to 
OPML?

d.) Should we put log, shortlog, history, rss, atom views together in 
one subroutine, and select the view using $format parameter of this 
subroutine?

-- 
Jakub Narebski
Poland
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html