Re: Defining performance.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Paul Lathrop - Thu at 02:59:27PM -0800]
> growing disk space usage. The DBA had come to the conclusion that the
> VACUUM command did/does not work on these systems, because even after a
> VACUUM FULL, the size of the database was continually increasing. So, as
> things stand with the PG7.2 machines, vacuuming is run nightly, and
> whenever the database size reaches 40Gb on disk (the point at which
> performance has degraded below tolerance), the DBA exports the data,
> deletes the database, and then imports the data, shrinking it to the
> actual size of the dataset.

We found one reason why vacuuming didn't always work for us - we had
long running transactions - in addition to killing the vacuum, it did
really nasty things to the performance in general.

To check for those transactions, I think it's needed to turn on
stats_command_string in the config.

I use this query to check:

select * from pg_stat_activity where current_query<>'<IDLE>' order by
query_start ;

If you spot any "<IDLE> in transaction" with an old query_start
timestamp, then that's most probably the reason.

Long running transactions doesn't have to be idle ... check the pg_locks
view for the lowest transactionid and compare (through the pid) with the
pg_stat_activity view to find the actual backend.

> However, we still are suffering a gradual decrease in performance over
> time - or so the application engineers claim. The DBA and I have been
> banging our heads against this for a month.

We're having the same issues, so we do the dumping and restoring every
now and then to be sure everything is properly cleaned up.  With 8.1.

> 1) How does one define 'performance' anyway? Is it average time to
> complete a query? If so, what kind of query? Is it some other metric?

We have the same kind of problem, and the project leader (I sometimes
refer him as the "bottleneck" ;-) is most concerned about iowait at our
cpu graphs.  Anyway, we do have other measures:

 - our applications does log the duration of each request towards the
   application as well as each query towards the database.  If the
   request (this is web servers) is taking "too long" time, it's logged
   as error instead of debug.  If a significant number of such errors
   is due to database calls taking too much time, then the performance
   is bad.  Unfortunately, we have no way to automate such checking.

 - I've setting up two scripts pinging that pg_stat_activity view every
   now and then, logging how much "gruff" it finds there.  Those two
   scripts are eventually to be merged.  One is simply logging what it
   finds, the other is a plugin system to the Munin graphing package. 

I've thrown the scripts we use out here:

http://oppetid.no/~tobixen/pg_activity_log.txt
http://oppetid.no/~tobixen/pg_activity.munin.txt

(I had to rename them to .txt to get the web server to play along).

Those are very as-is, should certainly be modified a bit to fit to any
other production environment. :-)

The pg_activity_log dumps a single number indicating the "stress level"
of the database to a file.  I think this stress number, when taking out
i.e. the 20% worst numbers from the file for each day, can indicate
something about the performance of the database server.  However, I
haven't had the chance to discuss it with the bottleneck yet.



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux