Re: "concurrency" attribute external_acl_type

Chris Robertson <crobertson@xxxxxxx> · Mon, 06 Apr 2009 16:27:40 -0800

louis gonzales wrote:
List,
1) for the "concurrency" attribute does this simply indicate how many
items in a batch will be sent to the external helper?

No.  There is  no such thing as a "batch" in HTTP.

1.1) assuming concurrency is set to "6" for example, and let's assume
a user's browser session sends out "7" actual URL's through the proxy
request - does this mean "6" will go to the first instance of the
external helper, and the "7th" will go to a second instance of the
helper?

Yes.

1.1.1) Assuming the 6 from the first part of the batch return "OK" and
the 7th returns "ERR", will the user's browser session, render the 6
and not render the 7th?

Again you use batch.  Evey request passed to a helper that is not 
blocked (by an http_access deny or a http_reply_access deny) will be 
passed to the browser.

  More importantly, how does Squid know that
the two batches - one of 6, and one with 1, for the 7 total, know that
all 7 came from the same browser session?

It doesn't.

What I have currently:
- openldap with postgresql, used for my "user database", which permits
me to use the "auth_param squid_ldap_auth" module to authenticate my
users with.
- a postgresql database storing my acl's for the given user database

Process:
Step1: user authenticates through squid_ldap_auth
Step2: the user requested URL(and obviously all images, content, ...)
get passed to the external helper

This is where you go awry.  The user requested URL 
(http://www.google.com) will be passed to the  helper.  If that URL 
results in an OK being passed back and nothing else prevents this 
request, the contents of that URL will be passed back to the browser.  
The browser will interpret the web page, and make a number of additional 
requests (in this example, that would include the Google logo and some 
sourced JavaScript).  Each of those requests will be handled in a like 
manner (perhaps resulting in still additional requests, such as 
JavaScript requesting images).

Step3: external helper checks those URL's against the database for the
specific user and then determines "OK" or "ERR"

Issue1:
How to have the user requested URL(and all images, content, ...) get
passed as a batch/bundle, to a single external helper instance, so I
can collectively determine "OK" or "ERR"

This is impossible due to the nature of the HTTP protocol.  There is no 
such thing as a "batch" or a "session".  Cookies were implemented to 
bypass this on a per-site basis.

Any ideas?  Is the "concurrency" attribute to declare a maximum number
of "requests" that go to a single external helper instance?

Concurrency is the maximum number of :*simultaneous* requests that a 
single external helper will handle.

  So if I
set concurrency to 15, should I have the external helper read count++
while STDIN lines come in, until no more, then I know I have X number
in a batch/bundle?

No.  There would be no way to know that the 15 requests are in any way 
related, the helper would allow or deny all 15 based on whether one of 
the requests is or is not okay and the helper would block waiting for a 
full queue of 15 requests before handling any.

Obviously there is no way to predetermine how many URL's/URI's will
need to be checked against the database, so if I set concurrency to
1024, "presuming to be high enough" that no single request will max it
out, then I can just count++ and when the external helper is done
counting STDIN readlines, I can process to determine "OK" or "ERR" for
that specific request?

Raising the number to 1024, would (hopefully by now, obviously) be a 
even worse idea.

Issue2:
I'd like to just have a single external helper instance start up, that
can fork() and deal with each URL/URI request,

That is exactly what concurrency expects.

 however, I'm not sure
Squid in its current incarnation passes enough information OR doesn't
permit specific enough passback (from the helper) information, to make
this happen.

A concurrent-enabled helper is passed (and is expected to pass back) a 
"query channel" tag to identify which response corresponds to which request.

Any deeper insights, would be tremendously appreciated.

Thanks,

Chris