Re: Best practices to manage custom statistics

Adrian Klaver <adrian.klaver@xxxxxxxxxxx> · Tue, 8 Nov 2016 12:28:00 -0800

On 11/08/2016 12:13 PM, Moreno Andreo wrote:
Hi,
    I'm checking if there's a best way to obtain stastistics based on my
database tables

Here's the scenario.
First of all, technical details:
- Postgresql 9.1, Ubuntu 12 on a 4 core, 32 GB machine with 600 GB disk
migrating to Postgresql 9.5.3, Debian 8 on a 8-core, 52 GB machine with
2 TB disk.
- 350 databases, 350 users, every user connects to his own database and
his teammates' (max 10 in total) so each user can connect to max 10
databases at a time

My application needs to achieve a certain number of statistics (how many
records are in a certain state, how many are in another state) to send
back to user.
This is obtained, at the moment, with a select count(*) from ..... (that
involves 4 joins on 4 tables) to be run run every 20 secs from each
client connected to the cluster (ATM about 650 clients configured, about
200 concurrent) to each database it has rights to connect.

I noticed that in some cases, especially when working with not-so-small
datasets (200k rows x 95 cols), and sometines returning not-so-small
datasets (10k rows) the query performs not so well, but the worst thing
is that it raises overall server load (I/O) and bandwidth usage. While
bandwidth is not a problem (I have spikes at 20 Mbps while "normal"
traffic speed is at about 3Mbps, but I have 1 Gbps available), server
load *is* a main problem, because in high-access periods
(summer/holidays) I see my server load go up to 22-25 on a 4-core
machine, and users call complaining for timeouts and slowness.

Even if I'm migrating to a better instance, I'm still trying to
"normalize" this feature.
I can start looking at indices (I'm not quite sure that those fields in
WHERE clause are all indexed), but I don't think it would boost its
performance.

I thought about having a table, say, 'tbl_counters', like this

CREATE TABLE tbl_counters{
uuid coduser,
int counter1,
int counter2,
....
int counterx
};
updated by trigger (when a value in a table is INSERTed/UPDATEd/DELETEd
it fires a function that increments/decrements values for counter x at
user y).
Just to avoid eventual trigger skipping its update, one time a day (say
at 2 am) a cron performs the above "monster query" for every database
and adjusts, if necessary, counter values for each user reflecting real
values.

In your experience, would this approach help me lower server load?
Are there any other approach I can try?

Instead of pushing why not pull. In other words do the users really 
check/need the statistics every 20 secs? Given that you say exact is not 
important over the course of day, why not create a mechanism for the 
user to poll the database when they need the information.

If more details are needed, just ask.

Thanks in advance and sorry for the long message (but I had to explain
such a complex thing)
Moreno.-

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general