Re: what server stats to track / monitor ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 12 Jun 2009, Alan McKay wrote:

So, from the perspective of both Linux and PG, is there canonical list
of "here are the most important X things to track" ?

Not really, which is why you haven't gotten such a list from anyone here. Exactly what's important to track does vary a bit based on expected workload, and most of the people who have been through this enough to give you a good answer are too busy to write one (you've been in my "I should respond to that" queue for two weeks before I found time to write).

Is there some good Linux performance monitoring and tuning reading
that you can recommend?

The only good intro to this I've ever seen, from the perspective of monitoring things would be useful to a database administrator, is the coverage of monitoring in "Performance Tuning for Linux Servers" by Johnson/Huizenga/Pulavarty. Their tuning advice wasn't so useful, but most OS tuning suggestions aren't either.

The more useful way to ask the question you'd like an answer to is "when my server starts to perform badly, what does that correlate with?" Find out what you need to investigate to figure that out, and you can determine what you should have been monitoring all along. That is unfortunately workload dependant; the stuff that tends to go wrong in a web app is very different from what happens to a problematic data warehouse for example.

The basic important OS level stuff to watch is:

-Total memory in use
-All the CPU% numbers
-Disk read/write MB/s at all levels of granularity you can collect (total across the system, filesystem, array, individual disk). You'll only want to track the total until there's a problem, at which point it's nice to have more data to drilldown into.

There's a bunch more disk and memory stats available, I rarely find them of any use. The one Linux specific bit I do like to monitor is the line labeled "Writeback" in /proc/meminfo/ , because that's the best indicator of how much write cache is being done at the OS level. That's a warning sign of many problems in an area Linux often has problems with.

On the database side, you want to periodically check the important pg_stat-* views to get an idea how much activity and happening (and where it's happening at), as well as looking for excessive dead tuples and bad index utilization (which manifests by things like too many sequential scans):

-pg_stat_user_indexes
-pg_stat_user_tables
-pg_statio_user_indexes
-pg_statio_user_tables

If your system is write-intensive at all, you should watch pg_stat_bgwriter too to keep an eye on when that goes badly.

At a higher level, it's a good idea to graph the size of the tables and indexes most important to your application over time.

It can be handy to track things derived from pg_stat_activity too, like total connections and how old the oldest transaction is. pg_locks can be handy to track stats on too, something like these two counts over time:

select (select count(*) from pg_locks where granted) as granted,(select count(*) from pg_locks where not granted) as ungranted;

That's the basic set I find myself looking at regularly enough that I wish I always had a historical record of them from the system. Never bothered to work this into a more formal article because a) the workload specific stuff makes it complicated to explain for everyone, b) the wide variation in and variety of monitoring tools out there, and c) wanting to cover the material right which takes a while to do on a topic this big.

--
* Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux