Re: tuning question (Apache Users)

On Sat, Jul 12, 2014 at 5:06 PM, Miles Fidelman <mfidelman@xxxxxxxxxxxxxxxx> wrote:

Jeff Trawick wrote:

On Sat, Jul 12, 2014 at 1:25 PM, Miles Fidelman <mfidelman@xxxxxxxxxxxxxxxx <mailto:mfidelman@meetinghouse.net>> wrote:

Hi Folks,

Ever once in a while, a crawler comes along and starts indexing
our site - and in the process pushes our server's load average
through the roof.

Short of blocking the crawlers, can anybody suggest some quick
tuning adjustments to make, to reduce load (setting the max.
number of servers and/or requests, renicing processes)?

Use robots.txt to block access to dynamically generated resources which are
expensive to generate and not necessary for search hits?

Is it using a lot of concurrent requests, or is the main load issue due to
the cost of the requests it is making?

a bit of both

If you want to limit concurrent requests just from web crawlers, try something like mod_qos. (See http://unix.stackexchange.com/questions/37481/throttling-web-crawlers)

If it were me, I'd try to block needless, expensive requests with robots.txt too. http://www.robotstxt.org/robotstxt.html

--
In theory, there is no difference between theory and practice.
In practice, there is. .... Yogi Berra

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx

Born in Roswell... married an alien...
http://emptyhammock.com/