On Mon, 16 Jun 2008, Lea Wiemann wrote: > Jakub Narebski wrote: >> >> I'd like to have reasoning when, and why, we use which responses (HTTP >> status codes) in which situations. Link to appropriate RFC you used >> (and perhaps to some other information) > > RFC 2616 has clear descriptions of the codes we use. The RFC number was not mentioned in commit message, if I remember correctly. URL like http://tools.ietf.org/html/rfc2616 would also be nice (especially when/if gitweb finally acquires committags support). > Maybe a really short list of conventions though, like you listed > below? That would be nice. Perhaps it should be organized differently, i.e. start with different HTTP error status codes, and when they are used (situations), and not with categories of situations and poposed HTTP codes. > If we have agreement I'll happily put it in. Even if there is no agreement, it is something that can be discussed when talking about given patch sent. >>> For convenience the die_error function now only takes the status code >>> without reason as first parameter >> >> Well, gitweb sometimes used different wording > > The 'reason' only appeared in the HTTP response header -- see below. I'd rather have explanation in HTTP resonse header, not only pure number. For example Test::WWW::Mechanize::CGI shows both code and _description_ when there is error (for get_ok). >>> open my $fd, "-|", git_cmd(), "ls-tree", $base, "--", $path >>> - or die_error(undef, "Open git-ls-tree failed"); >>> + or die_error(500, "Open git-ls-tree failed"); >> >> Should we really use "500 Internal Server Error" here? Usually this >> would be not an error at all, but wrong parameters to git command, >> i.e. it is _user_ who erred, not some error on _server_ side. > > You cannot tell for sure -- all you know here is that the command > somehow failed when it shouldn't have, and so all you can give is 500; > see below. I don't think we should apply reasoning like "most commonly > it's a wrong hash, so let's return 404" -- we don't know, and we > shouldn't assume. I think that in this situation, "open or die_error(...)", if error can happen at all, it can happen only because of some serious error, like git not installed[*1*], or corrupted repository, or something. We would have to examine "close or die_error(...)" to check (perhaps like Junio proposed examining details in a slow path, perhaps catching and examining Error from Git.pm methods, perhaps examining stderr) whether this is user error with 4xx HTTP status code, or server error with 5xx HTTP status code. [*1*] Perhaps we should have upfront in gitweb something like unless (-x "$GIT") die_error("500 Internal Server Error", 'Git binary not found'); or something like that. Here it is _server_ error, and you should mail webmaster / web page admin, and not examine your URL as in case of _user_ error of 4xx variety. >>> ... I suspect that the error could be triggered by non-existent hashes >>> as well, in which case the more specific 404 would be more appropriate >>> -- however, the current implementation oftentimes doesn't allow for >>> more fine-granulared checking >> >> ...that is why I'd rather have "403 Forbidden" as catch-all error, as >> it was done in gitweb. But that also seems not very good idea. > > [IANA HTTP expert, but:] All we have in cases like the one above is "I > cannot deliver the content you requested, but I'm so confused that I > don't even know why" -- which sounds like 500 to me (definitely not > 403). 403 would be used for some deactivated feature (commonly for > disabled directory listings I think) or something that shouldn't be > accessible to the public. I think that "404 Not Found" would be better catch-all than "500 Internal Server Error"... unless we are sure or almost sure that it _is_ server error. >>> sub die_error { >>> - my $status = shift || "403 Forbidden"; >>> - my $error = shift || "Malformed query, file missing or permission denied"; >>> + my $status = shift || 500; >>> + my $error = shift || "Internal server error"; >> >> Errr, I think "Malformed query, file missing or permission denied" is >> actually closer to truth, and better error message > > This default is intended for expressions like "some_action(...) or > die_error" (without parameters), in which case you really can't tell > what triggered the error, so "Internal server error" is as specific as > you can get. I think that most errors are of the above... but I guess that it is a moot point, because even if status sometimes is taken default, I don't think there is callsite that makes use of default for $error, i.e. second argument of die_error() is always set explicitely. >> (it is displayed in body of message) > > Just to make sure we're on the same boat, the $error variable above *is* > what's displayed in the body. True. >>> + # Use a generic "Error" reason, e.g. "404 Error" instead of >>> + # "404 Not Found". This is permissible by RFC 2616. >>> + git_header_html("$status Error"); >> >> I don't think it is _recommended_ practice. IIRC it is recommnded >> [...] to give more detailed explanation in the body of message. > > The code above prints the HTTP response headers, not the body -- the > body contains the detailed error message passed to the die_error > function. Here's a stripped down version of gitweb's error response > (X'ed out against spam filter): > > HTXTP/1.0 404 Error <------------ "Error" here > CoXntent-Type: text/htXml > > ... 404 - Hash not found ... > > I don't know of *any* browser or tool that actually displays the reason > given in the HTTP response header, so defaulting to Error is fine here Test::WWW::Mechanize displays both numeric HTTP response *and* reason given in HTTP response header when get_ok(...) fails. Besides we would want some explanation also on HEAD reuests (supported now only for web feeds, IIRC). > (especially since RFC 2616 explicitly allow it). Could you point me to where it is? And no, I don't think that "The reason phrases listed here are only recommendations -- they MAY be replaced by local equivalents without affecting the protocol." is it. > (I don't want to put the $error passed to die_error into the HTTP > response header, since badly interpolated strings might invalidate the > response. So using "Error" makes sure we always die gracefully.) Errr... I think we can trust gitweb programmers to not screw up. Perhaps there should be comment that first argument should be plain text, while second can be HTML fragment. >> Proposed HTTP server/client error status codes [heavily snipped]: Let us put it in different order. 1. 4xx: Client Error - The request contains bad syntax or cannot be fulfilled 1.1. 400 Bad Request - The request could not be understood by the server due to malformed syntax. Here I think we should put all the cases where we know that URL is invalid even without examining any data, after purely syntaxical checking. This means params with values outside their domain, like for example $page ('pg') parameter which is not a number. It *might* be also used for some unreasonable discrepancies in parameters which need access to git repository to check, like asking for blame view of non-blob... although this could also be "404 Not Found". 1.2. 403 Forbidden - The server understood the request, but is refusing to fulfill it. Authorization will not help (...) (Sometimes as "403 Permission Denied"). Here I think we should put all the cases where access is denied because of gitweb or repo configuration; this I think does not include non-exported projects. This usually means some 'feature' disabled (not enabled). 1.3. 404 Not Found - The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent. (...) This status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable. Here we should put all the situations where we ask for commit, and does not get it, as for a project, and such project does not exist, ask for a path, and there is no such path in specified tree-ish, asking for a tag while there is nothing under given name, etc. It could also be a reasonable catch-all. 2. 5xx: Server Error - The server failed to fulfill an apparently valid request 2.1. 500 Internal Server Error - The server encountered an unexpected condition which prevented it from fulfilling the request. Here I think we can put problems with git binaries, corrupted repositories, some Perl error etc. Something you should mail webmaster (web admin) about. All "open or die_error(...)" should I guess go there, although I'm not sure when/if that can happen. About "close or die...": those ones would need carefull examination if they should result in 404 or in 500 error code zone. 2.2. 503 Service Unavailable - The server is currently unable to handle the request due to a temporary overloading or maintenance of the server. The implication is that this is a temporary condition which will be alleviated after some delay. Perhaps during longer regenerating cache this should be response... -- Jakub Narebski ShadeHawk on #git Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html