Re: Squid Hardware requirements.

Ricardo Klein <klein.rfk@xxxxxxxxx> · Fri, 14 Jun 2013 19:44:45 -0300



I think that if you can use a good Disc controller (with 1G+ of cache) and make:
1 Raid10 for the SO with 4 discs
2 RAID10 for 2 disc_cache storages for squid with 4 discs each (or
even 2 RAID5 with 3 discs each)

you cold have a good I/O speed.
--
Att...

Ricardo Felipe Klein
klein.rfk@xxxxxxxxx


On Fri, Jun 14, 2013 at 7:44 PM, Ricardo Klein <klein.rfk@xxxxxxxxx> wrote:
> I think that if you can use a good Disc controller (with 1G+ of cache) and
> make:
> 1 Raid10 for the SO with 4 discs
> 2 RAID10 for 2 disc_cache storages for squid with 4 discs each (or even 2
> RAID5 with 3 discs each)
>
> you cold have a good I/O speed.
>
>
>
> --
> Att...
>
> Ricardo Felipe Klein
> klein.rfk@xxxxxxxxx
>
>
> On Fri, Jun 14, 2013 at 7:29 PM, Stephan Viljoen <steph@xxxxxxxxxxxx> wrote:
>>
>> I was thinking of buying a super micro server with 8 to 16 drive bays and
>> fill it with (15K RPM SAS disks) as I need more disk i/o. I guess a pure
>> memory system will still be the fastest option but I'm looking for
>> something
>> in between speeding up browsing and saving as much bandwidth as possible
>> without sacrificing to much speed.
>>
>>
>>
>> -----Original Message-----
>> From: Marcus Kool [mailto:Marcus.Kool@xxxxxxxxxxxxxxx]
>> Sent: Friday, June 14, 2013 5:35 PM
>> To: csn233
>> Cc: Stephan Viljoen; squid-users@xxxxxxxxxxxxxxx; support and sales desk
>> URLfilterDB
>> Subject: Re:  Squid Hardware requirements.
>>
>> On Fri, Jun 14, 2013 at 09:53:20PM +0800, csn233 wrote:
>> > With YMMV in mind, I get different mileage:
>> >
>> > On Fri, Jun 14, 2013 at 7:41 PM, Marcus Kool
>> > <marcus.kool@xxxxxxxxxxxxxxx> wrote:
>> > > and if your network pipe has sufficient capacity, also fetching an
>> > > object again from the internet is can be faster than fetching from
>> > > disk.
>> >
>> > Your network may be fast, but it doesn't imply a fast path between you
>> > and the origin server. In other words, it depends on other factors
>> > than just your own network pipe.
>>
>> yes, mileage may vary and depends on many factors.
>> Overall, squid servers without disk cache can be faster than with disk
>> cache, so it is worth looking at it.
>>
>> > > - more expensive (disks + battery-backed I/O controller)
>> >
>> > Expensive disks/battery-backed are over-kill. More/adequate spindles
>> > should do the job just as well. Why do you need a battery-backed
>> > controller? Squid is not a transaction-based system - if you lose the
>> > cache, tough, do "squid -z" and start again.
>>
>> fast disks are good. multiple controllers and mutiple buses are good.
>> An EMC disk array is the most expensive and best option since Squid
>> desires
>> a huge number of IOPS.
>> Battery-backed disk controllers are a good tradeoff: they are not so
>> expensive and give a reasonable performance boost.
>>
>> > > - Squid uses more memory to index the disk cache (14 MB memory per
>> > > GB disk
>> > > cache)
>> >
>> > My memory allocation is only about 20-30% of that (formula), and
>> > paging/swapping metrics doesn't indicate there is a problem. General
>> > formulas may not always apply.
>>
>> The 14 MB per GB is documented in the Squid wiki and based on the
>> observation that the avergae object size is 13 KB.
>> If you only have 20-30% of the formula you may have a larger average
>> object
>> size or only use 20-30% of the confgured disk cache.
>>
>> > > unless a redundant hot-swap RAID array is used, less downtime.
>> >
>> > Older versions has a problem if a cache_dir fails, I think. Has this
>> > changed with later versions, or in the pipeline to change, anyone?
>>
>> The thread started with a web proxy for an ISP.
>> ISPs generally do not want to restart the proxy and/or rebuild the index.
>> It takes too long.
>>
>> > > One can also redistribute budget:
>> > > - use the budget of the disk system to max out memory.
>> >
>> > The benefits of memory will plateau pretty quickly. Unless one
>> > regularly has a whole bunch of users wanting to access the same pages
>> > within a relatively short time, the benefit from more memory has its
>> > limits. Max-out could easily become wastage.
>>
>> No, memory is by far the fastest cache media.  Since memory is relatively
>> cheap it is the best option.
>>
>> > > - put as much memory as possible.
>> >
>> > Disagree - see above. It depends.
>>
>> Ok, I stated it a bit aggressive. It should read "Buy as much memory as
>> your
>> budget allows".
>>
>> > > - carefully size the disk cache; not too large since Squid keeps the
>> > > index
>> >
>> > Agree. If your hit-ratios don't increase, there's not much point in
>> > having larger cache_dir's. But I wouldn't go as far as "carefully".
>> > You just need enough or more, just not too much more.
>>
>> That is your point of view.  I prefer to be careful not to use more than
>> enough since it wastes memory.
>>
>> > > - if using a disk cache, use fast disks and a very good caching I/O
>> > > controller to get maximum disk performance
>> >
>> > Up to a point only, as mentioned above. Local disk I/O may be fast,
>> > but it doesn't mean your internet access will be as well. Which means
>> > you end up spending money on hardware that does not deliver actual
>> > results.
>>
>> Squid is hungry for a large number of IOPS. So get the best that your
>> budget
>> can buy.
>> For low budgets this is a relatively cheap caching disk controller, for
>> high
>> budgets it varies between low-end and high-end disk arrays (the ones that
>> have between 32 and 1000+ of spindles).
>>
>> > As Amos said, get the fastest per-core GHz you can find, number of
>> > cores not important. And have enough disk spindles.
>>
>>
>