Search Postgresql Archives

Re: When should I worry?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Greg Smith wrote:

On Mon, 11 Jun 2007, Steve Crawford wrote:

In my experience the more common situation is to "go off a cliff."

Yeah, I think the idea that you'll notice performance degrading and be able to extrapolate future trends using statistical techniques is a bit...optimistic.

Anyway, back to the original question here. If you're worried about catching when performance starts becoming an issue, you need to do some sort of logging of how long statements are taking to execute. The main choice is whether to log everything, at which point the logging and sorting through all the data generated may become its own performance concern, or whether to just log statements that take a long time and then count how many of them show up. Either way will give you some sort of early warning once you get a baseline; it may take a bit of tweaking to figure out where to draw the line at for what constitutes a "long" statement if you only want to see how many of those you get.

There are two tools you should look at initially to help process the logging information you get back: pgFouine and PQA. Here are intros to each that also mention how to configure the postgresql.conf file:

http://pgfouine.projects.postgresql.org/tutorial.html
http://www.databasejournal.com/features/postgresql/article.php/3323561

As they're similar programs, which would work better for you is hard to say; check out both and see which seems more practical or easier to get running. For example, if you only have one of PHP/Ruby installed, that may make one tool or the easier preferred.

If you can get yourself to the point where you can confidently say something like "yesterday we had 346 statements that took more then 200ms to execute, which is 25% above this month's average", you'll be in a positition to catch performance issues before they completely blindside you; makes you look good in meetings, too.


Starting to sound like a sane idea.
I've been running a test job for almost 24 hours and have accumulated only 8 million rows. That's another 125 days to get to the big 'B'. I think by then I'll have blown a hard drive or worse. I'm running this on some very old hardware that I have available (more of this at the bottom).

However, at this point the machine is running all of the SQL at < 0.2 seconds each. Which I consider just fine for 7,599,519 rows.

Here's some specifics about the tables:
count() from headers: 890300
count() from tokens:  890000
count() from header_token: 7599519


CREATE TABLE header_token (
    header_idx integer NOT NULL,
    token_idx integer NOT NULL
);

CREATE TABLE headers (
    idx serial NOT NULL,
    hash character varying(64) NOT NULL
);

CREATE TABLE tokens (
    idx bigserial NOT NULL,
    hash character varying(64) NOT NULL
);

ALTER TABLE ONLY headers
    ADD CONSTRAINT headers_hash_key UNIQUE (hash);
ALTER TABLE ONLY headers
    ADD CONSTRAINT headers_pkey PRIMARY KEY (idx);
ALTER TABLE ONLY header_token
    ADD CONSTRAINT pkey_header_token PRIMARY KEY (header_idx, token_idx);
ALTER TABLE ONLY tokens
    ADD CONSTRAINT tokens_hash_key UNIQUE (hash);
ALTER TABLE ONLY tokens
    ADD CONSTRAINT tokens_pkey PRIMARY KEY (idx);
ALTER TABLE ONLY header_token
ADD CONSTRAINT header_token_header_idx_fkey FOREIGN KEY (header_idx) REFERENCES headers(idx) ON DELETE CASCADE;
ALTER TABLE ONLY header_token
ADD CONSTRAINT header_token_token_idx_fkey FOREIGN KEY (token_idx) REFERENCES tokens(idx) ON DELETE CASCADE;



The SQL I was timing were:
select t.hash, h.hash
from headers h, header_token ht, tokens t
where h.idx = ht.header_idx
and ht.token_idx = t.idx
and h.idx = ?


insert into header_token
select $header, idx from tokens where idx in (...)

The SELECT was <0.2
The INSERT was easily <.7 (most of the time -- ranged because the idx IN (..) varied from 200 to 700. The min was <2 and the max was >1.0 from a few minutes of observation.


All of this was run on a Pentium II 450 MHz with 412MB RAM and a software linear 0 pair or UDMA 66 7200RPM 8MB Cache drives (really old) on seperate IDE channels with ReiserFS disk format. The actual script was running on a seperate machine across a 100-base-T full duplex network through a firewall machine between the two subnets.

I can't imagine how long it would take to run:
delete from tokens;
with the CASCADE option...


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux