Search Postgresql Archives

Re: Best practices to manage custom statistics

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Adrian,
First of all: now I've seen that not all fields touched by WHERE clauses in queries are indexed. Time to try some indices and EXPLAIN a bit....
(I must be blind... I've checked it multiple times....)
Scroll down for the rest of the thread.....

Il 23/11/2016 16:59, Adrian Klaver ha scritto:
On 11/23/2016 05:24 AM, Moreno Andreo wrote:
Il 15/11/2016 18:19, Adrian Klaver ha scritto:
On 11/15/2016 07:39 AM, Moreno Andreo wrote:
Sorry for late reply... i'm in some quite rough days....

Il 08/11/2016 21:28, Adrian Klaver ha scritto:
On 11/08/2016 12:13 PM, Moreno Andreo wrote:
[...]

In your experience, would this approach help me lower server load?
Are there any other approach I can try?



So does the user need only their data or do they need the other users
data also?
they can be able to download also their group mates' data (stats,
numbers), if they want. Numbers can be confusing because evey user can
have more than one workplace.



How can they be working with 'old' data? The queries you are running
are compiling stats on data that exist at the time they are run and at
any point in time between stats runs the user is working with current
data regardless of what the last stats say.
Since we are on ADO.NET (with Npgsql) and we don't keep connections open
(query, fill a structure and disconnect), in the time slice between two
updates they will have both data and stats that are not "real" in that
moment...
I'll try to make an example
You and me are teammates and work everyone at his place (several miles
away). You read data from my database and get both stats and work data.
Suddenly I make a change. This change won't be visible to you until you
refresh data (that was pressing "Update" button, now it's automated
every 60 secs).
Hope It's clearer now...


Aah, I get it now. You are refreshing forms every 60 seconds over 350 users each with their own database. Actually worse then that as there is user overlap over databases(up to 10 per user), so the same database can be hit multiple times at a given refresh. Seems physics is at work here as you have already pointed out. Namely fetching all that data at regular intervals taxes the bandwith as well as the CPU/storage.
Exactly. At the moment the bottleneck is I/O (running on a VM over a RAID-5 with 15kRPM SAS), in the next weeks I'm up to prepare a test server with SSDs ("attached SSD" on Google Cloud Platform), that's pretty much expensive, but data sheets point out a very good IOPS rate (rising as size, and price, rises). CPU is not a problem (almost never over 70-80%, average is about 40% over a 4 core server)

High levels solutions that come to mind to spread the load out:

1) Replication to spread data across multiple machines.
Or just split the databases over multiple non-replicated Postgres instances on separate machines
Already in place, but only for owner's database. Other databases can't be "downloaded" essentially for privacy matters.

2) Caching results from each individual database so subsequent calls for the information do not touch the database. You already touched on this with your counter table.
Yes, and I'm all ears on suggestions on what to be done and if there are hidden caveats... I heard that stored procedures (and triggers, I suppose) are faster to execute than the same "operation" coming from outside because the backend has not to translate it, and in this case can be executed only when needed

3) Spreading out the refresh interval. Not sure if the 60 second interval is synced across users. Still maybe setting different refresh intervals and/or changing it on the fly when load increases. A back off equation so to speak.
I'm afraid that if we have a moment when load is extremely high (I experienced top measuring 45 (5-minutes basis) on a 4-core machine, it was even hard to type in putty console!!), we won't have any upgrade. Yes, in these conditions we're still not having upgrades, but my goal is to find a way to reduce overall query load to have a good (V-)hardware design so I can have better performance with lower cost. "Pushing" data would be the best way... but I need to have a look to LISTEN/NOTIFY, and what it means with Npgsql and JDBC (one thing I didn't specify is that between user and database we have a web server, in LAN with PgSQL server, hosting web services for the user to interact with database. Maybe web server can "proxy" connections (keep them open) and the listen/notify method can be achieved (yes, it's a bit of hard work to rewrite all connection policies, but if it's necessary...)

I think that a combination of the last two should be the best.
Tell me what you think.....



Except when they are wrong:) Still been there.
Don't tell me.... :-)



Thanks in advance and sorry for the long message (but I had to explain
such a complex thing)
Moreno.-







--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux