"John 'Warthog9' Hawley" <warthog9@xxxxxxxxxxxxxx> writes: > Basic check for the claimed Agent string, if it matches a known > blacklist (wget and curl currently) don't display the 'Generating...' > page. > > Jakub has mentioned a couple of other possible ways to handle > this, so if a better way comes along this should be used as a > wrapper to any better way we can find to deal with this. > > Signed-off-by: John 'Warthog9' Hawley <warthog9@xxxxxxxxxxxxxx> > --- > gitweb/lib/cache.pl | 30 ++++++++++++++++++++++++++++++ > 1 files changed, 30 insertions(+), 0 deletions(-) > > diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl > index d55b572..5182a94 100644 > --- a/gitweb/lib/cache.pl > +++ b/gitweb/lib/cache.pl > @@ -116,6 +116,34 @@ sub isFeedAction { > return 0; # False > } > > +# There have been a number of requests that things like "dumb" clients, I.E. wget > +# lynx, links, etc (things that just download, but don't parse the html) actually > +# work without getting the wonkiness that is the "Generating..." page. > +# > +# There's only one good way to deal with this, and that's to read the browser User > +# Agent string and do matching based on that. This has a whole slew of error cases > +# and mess, but there's no other way to determine if the "Generating..." page > +# will break things. > +# > +# This assumes the client is not dumb, thus the default behavior is to return > +# "false" (0) (and eventually the "Generating..." page). If it is a dumb client > +# return "true" (1) > +sub isDumbClient { Please don't use mixedCase, but underline_separated words, e.g. browser_is_robot(), or client_is_dumb(). > + my($user_agent) = $ENV{'HTTP_USER_AGENT'}; What if $ENV{'HTTP_USER_AGENT'} is unset / undef, e.g. because we are runing gitweb as a script... which includes running gitweb tests? > + > + if( > + # wget case > + $user_agent =~ /^Wget/i > + || > + # curl should be excluded I think, probably better safe than sorry > + $user_agent =~ /^curl/i > + ){ > + return 1; # True > + } > + > + return 0; > +} Compare (note: handcrafted solution is to whitelist, not blacklist): +sub browser_is_robot { + return 1 if !exists $ENV{'HTTP_USER_AGENT'}; # gitweb run as script + if (eval { require HTTP::BrowserDetect; }) { + my $browser = HTTP::BrowserDetect->new(); + return $browser->robot(); + } + # fallback on detecting known web browsers + return 0 if ($ENV{'HTTP_USER_AGENT'} =~ /\b(?:Mozilla|Opera|Safari|IE)\b/); + # be conservative; if not sure, assume non-interactive + return 1; +} from "[PATCHv6 17/24] gitweb: Show appropriate "Generating..." page when regenerating cache" http://thread.gmane.org/gmane.comp.version-control.git/163052/focus=163040 http://repo.or.cz/w/git/jnareb-git.git/commitdiff/48679f7985ccda16dc54fda97790841bab4a0ba2 -- Jakub Narebski Poland ShadeHawk on #git -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html