On 11/08/2016 12:13 PM, Moreno Andreo wrote:
Hi, I'm checking if there's a best way to obtain stastistics based on my database tables Here's the scenario. First of all, technical details: - Postgresql 9.1, Ubuntu 12 on a 4 core, 32 GB machine with 600 GB disk migrating to Postgresql 9.5.3, Debian 8 on a 8-core, 52 GB machine with 2 TB disk. - 350 databases, 350 users, every user connects to his own database and his teammates' (max 10 in total) so each user can connect to max 10 databases at a time My application needs to achieve a certain number of statistics (how many records are in a certain state, how many are in another state) to send back to user. This is obtained, at the moment, with a select count(*) from ..... (that involves 4 joins on 4 tables) to be run run every 20 secs from each client connected to the cluster (ATM about 650 clients configured, about 200 concurrent) to each database it has rights to connect. I noticed that in some cases, especially when working with not-so-small datasets (200k rows x 95 cols), and sometines returning not-so-small datasets (10k rows) the query performs not so well, but the worst thing is that it raises overall server load (I/O) and bandwidth usage. While bandwidth is not a problem (I have spikes at 20 Mbps while "normal" traffic speed is at about 3Mbps, but I have 1 Gbps available), server load *is* a main problem, because in high-access periods (summer/holidays) I see my server load go up to 22-25 on a 4-core machine, and users call complaining for timeouts and slowness. Even if I'm migrating to a better instance, I'm still trying to "normalize" this feature. I can start looking at indices (I'm not quite sure that those fields in WHERE clause are all indexed), but I don't think it would boost its performance. I thought about having a table, say, 'tbl_counters', like this CREATE TABLE tbl_counters{ uuid coduser, int counter1, int counter2, .... int counterx }; updated by trigger (when a value in a table is INSERTed/UPDATEd/DELETEd it fires a function that increments/decrements values for counter x at user y). Just to avoid eventual trigger skipping its update, one time a day (say at 2 am) a cron performs the above "monster query" for every database and adjusts, if necessary, counter values for each user reflecting real values. In your experience, would this approach help me lower server load? Are there any other approach I can try?
Instead of pushing why not pull. In other words do the users really check/need the statistics every 20 secs? Given that you say exact is not important over the course of day, why not create a mechanism for the user to poll the database when they need the information.
If more details are needed, just ask. Thanks in advance and sorry for the long message (but I had to explain such a complex thing) Moreno.-
-- Adrian Klaver adrian.klaver@xxxxxxxxxxx -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general