On Fri, Jul 27, 2012 at 9:33 PM, David Barton <dave@xxxxxxxxxxxx> wrote: > Hi, > > I am running postgres 9.1.4 on Ubuntu 12.04 and the stats collector is > generating very high IO usage even when nothing appears to be happening on > the system. > > I have roughly 150 different databases, each of which is running in 1 of > roughly 30 tablespaces. The databases are small (the dump of most is are > under 100M, and all but 3 are under 1G, nothing larger than 2G). It isn't the size of the data that matters, but the number of objects. It sounds like your databases have about 150 statistics-containing objects each, in order to come up with a 3.5MB stats file. What do you gain by using databases rather than schema to do the segregation? > Previously iotop reported the disk write speed, at ~6MB / second. So that corresponds to about 2 physical write-outs of the stats file per second. Are you using ext4? It has the peculiar (to me) property that when a file is renamed out of existence, it writes out all of that file's obsolete dirty buffers, rather than just dropping them as uninteresting to anyone. That generates about 10 times the physical IO as the ext3 file system does. And of course about infinite times the physical IO as a tmpfs. > FWIW, I just migrated all these databases over to this new server by > restoring from pg_dump I was previously experiencing this on 8.3, which was > why I upgraded to 9.1 and I also have another server with similar problems > on 9.1. > > Any help would be sincerely appreciated. I think the first line of defense would be using /dev/shm to hold the stats file. I don't see any downside to that. You are reading and writing that file so ferociously anyway that it is always going to be taking up RAM, no matter where you put it. Indeed, under ext4 you might use even have several copies of it all locked into RAM as they wait to reach the disk before being dropped. Increasing the naptime, as you have already done, will also decrease the physical IO, but that has the trade-off of risking bloat. (But since you are running 150 databases on one machine, I doubt any of them are active enough for the risk of bloat to be all that great). However using /dev/shm should eliminate the IO entirely with no trade-off at all. But with /dev/shm the CPU usage of repeatedly formatting, writing, reading, and parsing the stat file will still be considerable, while increasing the naptime will reduce that as well. As far as coding changes to overcome the fundamental problem: A relatively easy change would be to make any given autovacuum worker on start up tolerate a stats file that is out of date by up to, say, naptime/5. That would greatly reduce the amount of writing the stats collector needs to do (assuming that few tables actually need vacuuming during any given cycle), but wouldn't change the amount of reading a worker needs to do because it still needs to read the file each time as it doesn't inherit the stats from anyone. I don't think it would be a problem that a table which becomes eligible for vacuuming in the last 20% of a cycle would have to wait for one more round. Especially as this change might motivate one to reduce the naptime since doing so will be cheaper. But it seems like maybe the stats collector could use a ground-up recoding. Maybe it could use a shared relation to store the stats within the database cluster itself, so that edits could be done in place per database rather than re-writing the entire cluster's stats? But I certainly am not volunteering to take on that task. A compromise might be to have one stats file per database. That way any given backend only needs to read in the database file it cares about, and the stat's collector only needs to write out the one database asked of it. This change could be mostly localized to just pgstat.c, I think. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance