Hey Alex,
Thank you for your reply, I am sorry, I think I explained myself wrong.
What I meant by option C, is to have basically 3 functions, 2 functions for std/stdout, and one function
that will fetch the data from the DB every 60 seconds, and save it into a global variable for the other functions
to use.
Then, when a new std request comes in, the std handler will simply read from that variable, instead of from the DB.
I see the following benefits in this approach:
1. We will have only one DB connection every 60 seconds, per Squid worker instance.2. It will be very fast since the std handler will simply read from a local variable.
you will have as many database
clients as there are workers in your Squid instance
You are definitely right, but as this will be much faster I think I will be able to decrease my number of workers significantly.
Also, we might be able to use concurrency=n here to decrease it further?
Would love to hear your thoughts on this,
Roee
On Tue, Feb 8, 2022 at 6:38 PM Alex Rousskov <rousskov@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
On 2/8/22 11:08, roee klinger wrote:
> I thought about the following approach:
>
> 1. Have only one python helper, this helper fetches the data every
> minute from the main DB.
> 2. This helper has concurrency set for it.
> 3. The helper then spawns child processes using multithreading, each
> process responds to std/stdout and reads the data from the main process
> which spawned it.
>
> What do you think about taking this route?
>
> It will require no extra DBs and no tweaks to Squid, but maybe I am
> missing something
With this approach (let's call it C), you will have as many database
clients as there are workers in your Squid instance, just like in option
A. Option C is probably a lot easier to implement for a given helper
than the generic option A. Option B gives you one database client per
Squid instance.
It is not clear to me why C parallelizes reading/writing from/to
stdin/stdout -- I doubt that task is the bottleneck in your environment.
I would expect a single stdin reader thread and a single stdout writer
thread instead.
This is not my area of expertise, but if you do go option C route, you
may need to protect helper's stdin/stdout descriptors with a mutex so
that threads can read/write from/to stdin/stdout without getting
mangled/partial reads and mangled/overlapping writes.
Alex.
> On Tue, Feb 8, 2022 at 5:12 PM Alex Rousskov wrote:
>
> On 2/8/22 09:50, roee klinger wrote:
>
> > Alex: If there are a lot more requests than your users/TTLs should
> > generate, then you may be able to decrease db load by
> figuring out
> > where the extra requests are coming from.
>
> > actually, I don't think it matters much now that I think about it
> > again, since as per my requirements, I need to reload the cache every
> > 60 seconds, which means that even if it is perfect, MariaDB will
> > still get a high load. I think the second approach will be better
> > suited.
>
> Your call. Wiping out the entire authentication cache every 60 seconds
> feels odd, but I do not know enough about your environment to judge.
>
>
> > Alex: aggregating helper-db connections (helpers can be written to
> > talk through a central connection aggregator)
> >
>
> > That sounds like exactly what I am looking for, how would one go
> about
> > doing this?
>
> You have at least two basic options:
>
> A. Enhance Squid to let SMP workers share helpers. I assume that you
> have C SMP workers and N helpers per worker, with C and N significantly
> greater than 1. Instead of having N helpers per worker and C*N helpers
> total, you will have just one concurrent helper per worker and C
> helpers
> total. This will be a significant, generally useful improvement that
> should be officially accepted if implemented well. This enhancement
> requires serious Squid code modifications in a neglected error-prone
> area, but it is certainly doable -- Squid already shares rock diskers
> across workers, for example.
>
> B. Convert your helper from a database client program to an Aggregator
> client program (and write the Aggregator). Depending on your needs and
> skill, you can use TCP or Unix Domain Sockets (UDS) for
> helper-Aggregator communication. The Aggregator may look very
> similar to
> the current helper, except it will not use stdin/stdout for
> receiving/sending helper queries/responses. This option also requires
> development, but it is much simpler than option A.
>
>
> HTH,
>
> Alex.
>
>
> > On Tue, Feb 8, 2022 at 4:41 PM Alex Rousskov wrote:
> >
> > On 2/8/22 09:13, roee klinger wrote:
> >
> > > I am running multiple instances of Squid in a K8S
> environment, each
> > > Squid instance has a helper that authenticates users based
> on their
> > > username and password, the scripts are written in Python.
> > >
> > > I have been facing an issue, that when under load, the
> helpers (even
> > > with 3600 sec TTL) swamp the MariaDB instance, causing it to
> > reach 100%
> > > CPU, basically I believe because each helper opens up its own
> > connection
> > > to MariaDB, which ends up as a lot of connections.
> > >
> > > My initial idea was to create a Redis DB next to each Squid
> > instance and
> > > connect each Squid to its own dedicated Redis. I will sync
> Redis
> > with
> > > MariaDB every minute, thus decreasing the connections
> count from
> > a few
> > > 100s to just 1 every minute. This will also improve speeds
> since
> > Redis
> > > is much faster than MariaDB.
> > >
> > > The problem is, however, that there will still be many
> > connections from
> > > Squid to Redis, and I probably that will consume a lot of DB
> > resources
> > > as well, which I don't actually know how to optimize,
> since it seems
> > > that Squid opens many processes, and there is no way to
> get them
> > to talk
> > > to each other (expect TTL values, which seems not to help
> in my
> > case,
> > > which I also don't understand why that is).
> > >
> > > What is the best practice to handle this? considering I
> have the
> > > following requirements:
> > >
> > > 1. Fast
> > > 2. Refresh data every minute
> > > 3. Consume as least amount of DB resources as possible
> >
> > I would start from the beginning: Does the aggregate number
> of database
> > requests match your expectations? In other words, do you see
> lots of
> > database requests that should not be there given your user access
> > patterns and authentication TTLs? In yet other words, are
> there many
> > repeated authentication accesses that should have been
> authentication
> > cache hits?
> >
> > If there are a lot more requests than your users/TTLs should
> generate,
> > then you may be able to decrease db load by figuring out
> where the
> > extra
> > requests are coming from. For example, it is possible that your
> > authentication cache key includes some noise that renders caching
> > ineffective (e.g., see comments about key_extras in
> > squid.conf.documented). Or maybe you need a bigger
> authentication cache.
> >
> > If the total stream of authentication requests during peak
> hours is
> > reasonable, with few unwarranted cache misses, then you can start
> > working on aggregating helper-db connections (helpers can be
> written to
> > talk through a central connection aggregator) and/or adding
> database
> > power (e.g., by introducing additional databases running on
> previously
> > unused hardware -- just like your MariaDB idea).
> >
> >
> > Cheers,
> >
> > Alex.
> >
>
_______________________________________________ squid-users mailing list squid-users@xxxxxxxxxxxxxxxxxxxxx http://lists.squid-cache.org/listinfo/squid-users