Search Postgresql Archives

Re: Huge number of INSERTs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/18/2011 04:30 AM, Phoenix Kiula wrote:
On Fri, Nov 18, 2011 at 9:25 AM, Steve Crawford
<scrawford@xxxxxxxxxxxxxxxxxxxx>  wrote:


Database only? Or is it also your webserver?

It's my webserver and DB. Webserver is nginx, proxying all PHP
requests to apache in the backend.
You still didn't answer what "massive traffic" means.

What version of PostgreSQL? What OS? What OS tuning, if any, have you done?
(Have you increased readahead? Changed swappiness, turned off atime on your
mounts, made syslogging asynchronous, etc?). Does your RAID have
battery-backed cache? What are the cache settings?

PG 9.0.5

CentOS 5 64 bit

OS tuning - lots of it since the beginning of time. What specifically
would you like to know? Please let me know and I can share info. Like
SHM Max and Min variables type of things?

RAID has the 3Com battery backed cache, yes. Not reporting any errors.
3Com? Perhaps you mean 3Ware? And (again) what are the RAID cache *settings*? In particular, the write-back/write-through setting.



What is the nature of the queries? Single record inserts or bulk? Same for
the selects. Have you run analyze on them and optimized the queries?

Simple INSERTs. Into a table with 6 columns. Column 1 is a primary
key, column 5 is a date. There are two indexes on this table, on the
pkey (col1) and one on the date (col5).

SELECTs are simple straight selects, based on pkey with limit 1. No
joins, no sorting.



What is
the typical duration of your queries? Are lots of queries duplicated
(caching candidates)?

The bulk of the big SELECTs are in "memcached". Much faster than PG.

It's INSERTs I don't know what to do with. Memcached is not a good
solution for INSERTs, which do need to go into a proper DB.
So most of your selects aren't hitting the database. Since we are talking db tuning, it would have been nice to know how many queries are hitting the database, not the number of requests hitting the webserver. But the question was "what is the typical duration of the queries" - specifically the queries hitting the database.


What is the size of your database? Do you have any
bandwidth bottleneck to the Internet?

Full DB:   32GB
The big table referenced above:  28 GB	

It's inserts into this one that are taking time.
Earlier you said you were doing 200 inserts/minute. Is that an average throughout the day or is that at peak time. Peak load is really what is of interest. 200 inserts/minute is not even 4/second.

Is this your database server only or is it running web and/or other
processes? How long does a typical web-request take to handle?

How can I measure the time taken per web request? Nginx is super fast,
based on apache bench. Apache -- how do I test it? Don't want to do
fake inserts. With selects, apache bench uses memcached instead..
Look at your log. If it isn't set to record request time, set it to do so. I set my Apache servers to log request time in microseconds.

At first blush, and shooting in the dark, I'll guess there are lots of
things you can do. Your shared_buffers seems a bit low - a rough starting
point would be closer to 25% of your available RAM.

If I make shared_buffers too high, (2GB....25% of my 8GB RAM), this
brings the server to its knees instantly. Probably because I have
apache, nginx, memcached running on the same server. Nginx and
memcached are negligible in terms of memory consumption.
Not total RAM, "*available* RAM" - that is the memory available after loading the OS, Nginx, Apache, etc. Earlier you had a snapshot from "top" which showed over 5G cached and swap basically unused which means all your programs combined are using well under half your RAM and the remaining RAM is acting as cache. But that output was from a point-in-time. You would need to observe it over time and under load.
You are a prime candidate for using a connection pooler. I have had good
luck with pgbouncer but there are others.

Will pgbouncer or pgpool help with INSERTs?
Only indirectly. As mentioned by myself and others, you have a real problem with the number of simultaneous connections. A connection pooler will allow you to have fewer database connections open and thus use resources more efficiently. It will also reduce the overhead from connections. In the simplest case, I've seen a 10x improvement in database connection setup with pgbouncer. And a pooler can let you use persistent connections and virtually eliminate the connection setup overhead. You have to be careful, though. If you use the most aggressive pooling settings, one web process can affect the operation of another. In particular, things like "SET ... TO ..." statements or creation of temporary tables will be associated with the backend connection to the database. With aggressive settings, the same web request could have each database statement handled by a different database backend. So start off with conservative pool settings and advance only if required and have studied the potential side-effects.

BTW, what things are competing for disk? Perhaps you could run iostat for a few minutes at peak load. If you haven't turned off atime on your mounting, every request will probably generate several write requests just to update the access time for each file that gets read. If you are logging your web requests and getting, say, 1200 requests/minute (20/second) to syslog and it is set to synchronous writes which is often the default then logging alone is triggering lots of fsync activity. If you can live with losing a few log entries after a crash, switch logging to asynchronous.

Also, are you using APC? It won't help PostgreSQL inserts directly but by pre-compiling/caching the PHP code you will free up resources for your other processes including PostgreSQL.

Cheers,
Steve


--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux