Re: [RFC] Implementing gitweb output caching - issues to solve

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>> Interesting.  http://www.user-agents.org/ seems to suggest that many
>> robots do use Mozilla (though I don't think it's worth bending over
>> backwards to help them see the page correctly).

If a robot reports itself and we don't know about it, I'm fine with
giving it the 'Generating...' page as opposed to what it's expecting.
The number of robots and things of that nature that won't handle the
meta refresh are fewer than the number of people who will be clicking
with eyeballs on a screen.

>> HTTP::BrowserDetect uses a blacklist as far as I can tell.  Maybe in
>> the long term it would be nice to add a whitelist ->human() method.
>>
>> Cc-ing Olaf Alders for ideas.
> 
> Thanks for including me in this.  :)  I'm certainly open to patching the module, but I'm not 100% clear on how  you would want to implement this.  How is ->is_human different from !->is_robot?  To clarify, I should say that from the snippet above, I'm not 100% clear on what the problem is which needs to be solved.

At this point I don't really see an issue with HTTP::BrowserDetect's
robot() function, and I agree with human = !->is_robot.

One thing I would like to see is the ability to do some sort of an add
to the list of things to check for.  As you are probably aware there are
more agents that exist than what you have setup, I'm moving forward and
handling it with the following:

sub is_dumb_client {
        my($user_agent) = lc $ENV{'HTTP_USER_AGENT'};

        my $browser_detect = HTTP::BrowserDetect->new($user_agent);

        return 1 if ( $browser_detect->robot() );

        foreach my $adc ( @additional_dumb_clients ) {
                return 1 if ( index( $user_agent, lc $adc ) != -1 );
        }

        return 0;
}

which could be simplified if there was just some way to do

        my($user_agent) = lc $ENV{'HTTP_USER_AGENT'};

        my $browser_detect = HTTP::BrowserDetect->new($user_agent);

        $browser_detect->add_robots( @array );

        return 1 if ( $browser_detect->robot() );

Not sure that particularly generalizes, and honestly it's only 4 lines
of code to do add additional checks.

- John 'Warthog9' Hawley
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]