Re: Postgres 9.1.4 - high stats collector IO usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jeff,

Thanks for the detailed reply.

On 13/08/12 05:23, Jeff Janes wrote:
On Fri, Jul 27, 2012 at 9:33 PM, David Barton <dave@xxxxxxxxxxxx> wrote:
Hi,

<snip>
I have roughly 150 different databases, each of which is running in 1 of
roughly 30 tablespaces.  The databases are small (the dump of most is are
under 100M, and all but 3 are under 1G, nothing larger than 2G).
It isn't the size of the data that matters, but the number of objects.
  It sounds like your databases have about 150 statistics-containing
objects each, in order to come up with a 3.5MB stats file.

What do you gain by using databases rather than schema to do the segregation?
I had never imagined that there was such a profound difference between using schemas and using databases. I imagine that I could convert from using databases to using schemas.

Previously iotop reported the disk write speed, at ~6MB / second.
So that corresponds to about 2 physical write-outs of the stats file
per second.  Are you using ext4?  It has the peculiar (to me) property
that when a file is renamed out of existence, it writes out all of
that file's obsolete dirty buffers, rather than just dropping them as
uninteresting to anyone.  That generates about 10 times the physical
IO as the ext3 file system does.  And of course about infinite times
the physical IO as a tmpfs.
It was previously on ext3 and moved to ext4. That didn't seem to make a difference, I'm guessing that the higher IO on the new server was just that it was capable of doing it.

FWIW, I just migrated all these databases over to this new server by
restoring from pg_dump  I was previously experiencing this on 8.3, which was
why I upgraded to 9.1 and I also have another server with similar problems
on 9.1.

Any help would be sincerely appreciated.
I think the first line of defense would be using /dev/shm to hold the
stats file.  I don't see any downside to that.  You are reading and
writing that file so ferociously anyway that it is always going to be
taking up RAM, no matter where you put it.  Indeed, under ext4 you
might use even have several copies of it all locked into RAM as they
wait to reach the disk before being dropped.

Increasing the naptime, as you have already done, will also decrease
the physical IO, but that has the trade-off of risking bloat.  (But
since you are running 150 databases on one machine, I doubt any of
them are active enough for the risk of bloat to be all that great).
However using /dev/shm should eliminate the IO entirely with no
trade-off at all.

But with /dev/shm the CPU usage of repeatedly formatting, writing,
reading, and parsing the stat file will still be considerable, while
increasing the naptime will reduce that as well.
The CPU overhead seems pretty minimal, and a slight reduction in naptime should be more than enough.

As far as coding changes to overcome the fundamental problem:

A relatively easy change would be to make any given autovacuum worker
on start up tolerate a stats file that is out of date by up to, say,
naptime/5.  That would greatly reduce the amount of writing the stats
collector needs to do (assuming that few tables actually need
vacuuming during any given cycle), but wouldn't change the amount of
reading a worker needs to do because it still needs to read the file
each time as it doesn't inherit the stats from anyone.  I don't think
it would be a problem that a table which becomes eligible for
vacuuming in the last 20% of a cycle would have to wait for one more
round.  Especially as this change might motivate one to reduce the
naptime since doing so will be cheaper.
If the stats are mirrored in memory, then that makes sense. Of course, if that's the case then couldn't we just alter the stats to flush at maximum once per N seconds / minutes? If the stats are not mirrored in memory, doesn't that imply that most of the databases will never flush updates stats to disk and so the file will become stale?

But it seems like maybe the stats collector could use a ground-up
recoding.  Maybe it could use a shared relation to store the stats
within the database cluster itself, so that edits could be done in
place per database rather than re-writing the entire cluster's stats?
  But I certainly am not volunteering to take on that task.

A compromise might be to have one stats file per database.  That way
any given backend only needs to read in the database file it cares
about, and the stat's collector only needs to write out the one
database asked of it.  This change could be mostly localized to just
pgstat.c, I think.
That approach was what I had thought I am not a C programmer by any stretch of the imagination, which is why I asked if there was a place to find this kind of thing. It seems likely there are a few features that people would be willing to put money towards.


Cheers,

Jeff


Regards,
David


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux