Re: Ideas for Squid statistics Web UI development

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Mon, 19 Nov 2012 14:34:15 +1300

On 19.11.2012 13:05, George Machitidze wrote:
Hello

I've started development of open sourced Web UI for gathering stats
for Squid proxy server and need your help to clarify needs and
resources.

Where it came from:
Enterprises require auditing, reporting, configuration
check/visibility and statistics. I can say that most of these things
are easy to implement and provide in different ways, except reporting
and stats. Additionally, there are some requirements in functionality
and nice interface not met by currently available solutions that I've
found. Also, state of maintenance, future development etc are very
unclear and Ineffective, but still acceptable or enough for _some_
installations. If you know something that can do all this stuff -
please let me know.
So, I've decided to write everything from the scratch, maybe will 
take
some public-licensed part from other projects.

You did not consider joining any existing FOSS project and providing 
the productivity boost to remove those lacks you noticed?

The core problem with any FOSS project is its volunteer nature. People 
working towards something that will work fine for their needs and 
omitting minor details others need for the product to be portable 
between installations.
 I mention this as something you should look at seriously because 'from 
scratch' is a multi-year project with a long initial period where your 
own product is just another partially-baked piece of code. You can save 
yourself a lot of time (and marketing hassle) by improving something 
already written and promoting that.

I suspect may of the problems existing reporters have is also due to 
unreported Squid APIs limitations. But again we (Squid Project) need 
feedback, patches and developer assistance improving those in Squid so 
other projects can report the data efficiently.

Architecture:
Starting point is gathering stats, then we need to manipulate and
store it, then we can add some regular jobs (will avoid this) and 
then
we need to view this.

Gathering data
Available sources:
1. Logs, available via files or logging daemon (traffic, errors)
2. Stats available via SNMP  (status/counters/config)
3. Cache Manager (status/counters/config)
4. OS-level things (footprint, processes, disk, cpu etc)
[anything else?]

(2) and (3) are *supposed* to present the same information in 
alternative machine and human readable formats. BUT .. uhm there are 
holes.
I am interested in patches sent to squid-dev improving either (2) or 
(3) outputs (http://wiki.squid-cache.org/MergeProcedure).

NP: The actually important errors are not logged to the daemon. They 
are logged to cache.log instead. You will need *2* forms of log 
processing to retrieve administrative error reports, one for access.log 
traffic issues and one for cache.log systemic issues.

This part will be done by local logging daemon, I won't use file
logging for known reasons.
BTW, good starting point is log_mysql_daemon by marcello, available 
in
GPL, written in perl. Effective enough to start and load any data to
DB - it's simple enough and took for me 10-15 minutes to analyze the
code, setup and configure.

Data storage
File-based logging is very ineffective and has several huge 
disadvantages:
- Ineffective use of disk resources
- Poor/no indexing
- Logrotation/DWH/archiving
- Not human readable, some parts need calculations anyway
- etc

For optimized storing and then viewing of data It's actually required
to have DB. For first step I'll use MySQL, then will migrate the code
to support PgSQL (and maybe others too) through DB abstraction layer.

We can store all of the access logs and also have some dynamically
updated counters, because periodic jobs are very intensive and 
require
time too.

I don't want to put counter-updating code on the logging daemon, will
try to use DB-side for that as it's done in log_mysql_daemon.

NP: this daemon is actually database agnostic. The early release was 
erroneously called 'mysql' because it was implemented on that database. 
Since 3.2 it is called log_db_daemon.

To use PgSQL or any other database just alter the provided .sql 
template files and use pgsql in the squid.conf access_log DNS parameter.

If there are any database-specific schema changes that would improve 
efficiency and reporting of this tool ... again I am interested in 
patches sent to squid-dev (http://wiki.squid-cache.org/MergeProcedure).

If someone will need this data for monitoring purposes not available
via SNMP/OS through Nagios/Cacti/Zabbix/whatever - I see no problem 
to
do that too.

AFAIK the needs in this area are centered around useful templates or 
plugins for polling the Squid OID with those tools. There are a lot of 
very useful OID data which can already be pulled out of Squid but 
nothing easily available in the FOSS area to do that display.

Cacti has a few old templates available (if one is willing to hunt them 
down and fix a few bugs) for HIT ratio and overall traffic/disk usage 
but client info and error reporting is very noticeably absent. I'm not 
sure about the other tools.

Web UI
Technologies: PHP/CSS/JS/Ajax etc
PHP will select data from DB and generate pages accordingly.

TODO:
1. Collect information about UI requirements - what users want to see
and control
2. Define all the counters, logging variables for daemon part 
required
for implementing first needs, according to P1
3. Define DB-side counters, sources
4. Check data types and lenght for DB for optimization
5. Continuous improvement

Any involvement: information about user needs, suggestions,
recommendations, coding, ideas are appreciated :)

I chose GitHub for hosting the project, will write project docs and
plans there. Currently I am collecting a very detailed information on
user needs.

Thanks

Best regards,
George Machitidze

Wonderful to hear about more progress in the administration sphere.

I have an ongoing project by Francesco Chemolli (kinkie) to improve the 
cachemgr and SNMP information feeds. Would you be interested in 
collaboration on the Squid internal upgrades needed to support our three 
administration interfaces?

The prime objectives of our feature project in no particular order are:
 * to upgrade the cachemgr reports output such that it can be used as 
an Open Web API for managing Squid via plugin Web UI.
 * create a HTML + XHR alternative to cachemgr.CGI.
 * to synchronize the cachemgr and SNMP reporting such that all data is 
equally available through either - as alternative API rather than 
supplementary.

Amos Jeffries
Treehouse Networks Ltd.