Re: Squid-2, Squid-3, roadmap

Mark Nottingham <mnot@xxxxxxxxxxxxx> · Fri, 7 Mar 2008 10:33:30 +1100

Ideally, you'd avoid locking as much as possible; e.g., have a pool of  
threads for disk access (as now with aufs), a pool for header parsing,  
a pool for forward requests, and so on. I don't think it's a good idea  
at all to re-architect squid into a thread-per-connection model or  
anything; just find the places that are bottlenecks and allow some  
parallelism, keeping the number of threads low.

(says he, the non-threads programmer. I'm not *that* crazy...)

Redirectors and other helpers are already able to run on other CPUs,  
so that's a non-issue.

Cheers,

On 07/03/2008, at 3:05 AM, Adrian Chadd wrote:

Well, the way I'd approach it is to first get an idea of how to throw
things into 'threads', and probably draft and craft a basic event loop
and submission queue for "stuff" to happen across threads.

Then "Squid" can run as one thread, and CPU intensive stuff can happen
via message queues to other threads.

Eventually my gut feeling (reliable as it is) tells me that the most
efficient and scalable way of doing this is to create a lightweight
"squid" that handles just client and server-side interactions, with  
storage,
logging, ACLs and other stuff happening in other threads, and then
create multiple "squid" threads that run almost indepedently from  
one another.
This would avoid all of the crazy fine-grain locking that  
traditionally is done
to take a non-threaded app into the threaded world. I really think
avoiding that is a very good idea.

Oh, and no, there's nothing in Squid right now that "jumps out" save  
perhaps
pushing regular expression lookups into a seperate thread or  
threads. But
really, if you're going to do that then you're better off pushing a  
large part
of the ACL subsystem into seperate threads and have the main code  
submit
lookup requests there. Of course, what would be interesting there is  
benchmarking
how effective it'd be to batch things like ACL lookups in "groups"  
to try and
get some cache coherency effects going, rather than the current  
tendency for Squid
to process a request as far as it can go before something blocking  
comes along,
blowing much of the CPU cache away as possible in the meantime.

But really, the big problem is to spend some time looking at efficient
ways of parallelising network applications and what works well on  
current
hardware/OSes. I'm just playing around with a simple TCP proxy right  
now which
I'll use to experiment with "better" ways of doing stuff reasonably  
portably.
I can then set this as the "upper bounds" for how well stuff may  
perform, and
can then spend some time looking at how to tune things like  
parallelism,
IO handling, memory allocation and event notification. Then I can  
spend some more
time looking at batching operations such as IO, ACL lookups, etc -  
see if better
use of CPU caches can be made and also see if doing all the system  
read/write
syscalls in one hit per loop rather than spread out throughout the  
program execution
makes any difference.

Its really hard to benchmark -these- inside Squid, and thus its very  
difficult to
figure out how to make better use of current hardware. _This_ is the  
"First Problem"
to solve.

Of course, all of this depends entirely on whether I get enough  
clients to start
funding some of this work, and how much I can dedicate to this over  
my Semantics,
Experimental Methods and Behavioural Neuropsychology classes this  
semester. :)

Adrian
(Sleep? Hah!)

On Thu, Mar 06, 2008, Chris Woodfield wrote:
I'll readily admit that I Am Not A Developer, but I'm wondering if
this could be something that could be worked incrementally - finding
easy-to-cleave-off subsystems that can be moved to separate threads
similarly to how asyncio was. The most obvious one I can think of is
the front-end client/server network socket communication code; next
would be logging. Are there any other subsystems that jump out as
"independent" enough to do this in the existing code base?

-C

On Mar 6, 2008, at 4:17 AM, Adrian Chadd wrote:

On Wed, Mar 05, 2008, Michael Puckett wrote:
Mark Nottingham wrote:

A killer app for -3 would be multi-core support (and the perf
advantages that it would bring), or something else that the
re-architecture makes possible that isn't easy in -2. AIUI,  
though,
that isn't the case; i.e., -3 doesn't make this significantly
easier.
Absolutely THE killer app for either -2 or -3. The fact that multi-
core
processors are now the defacto standard in any box makes this more
important by the day IMHO. Being able to do sustained IO across
multiple
Gb NICs will absolutely require it. This is the single biggest
performance enhancement that could be implemented. So where does
multi-core support fall on either roadmap?

12 months away on my draft Squid-2 roadmap, if there was enough
commercial
interest. Thing is, the Squid internals are very horrible for SMP
(both 2 and 3)
and the list of stuff that I've put into the squid-2 roadmap is what
I think
is the minimum amount of work required before really starting to
take advantage
of multiple cores.

Adrian

--
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial
Squid Support -
- $25/pm entry-level VPSes w/ capped bandwidth charges available in
WA -

--
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial  
Squid Support -
- $25/pm entry-level VPSes w/ capped bandwidth charges available in  
WA -

--
Mark Nottingham       mnot@xxxxxxxxxxxxx