Re: strategies for dealing with frequently updated tables

Dave Vitek <dvitek@xxxxxxxxxxxxxx> · Tue, 28 Feb 2012 19:41:39 -0500

On 2/28/2012 5:21 PM, Andy Colson wrote:
On 2/28/2012 2:09 PM, Dave Vitek wrote:
The following query takes about 100 minutes (3 seconds per tuple):
SELECT count(id) from T

so table T has 18,000 rows? (100 * 60 * 3)
100 minutes * 60 (sec/min) / 3 (sec/tuple) = 2000 tuples

Well... math was never my strong point :-)

So you have a large row, who knows how many, and the problem is
"SELECT count(id) from T" is slow?
Correct. I think anything requiring a sequential scan of the table takes
100 minutes to run. However, I wanted to be careful about drawing
conclusions and just present data.

-Andy
ps: based on you're email gramma, I almost changed all your to you're,
just to see if it annoyed you. Sounds like you work for grammar
technology.
How embarrassing! After writing code all day I tend to have some natural
language issues.

No, no, that's not what I meant.  Your email address grammatech.com, 
makes it sound like you are doing grammar technology.

Have you checked for table bloat?  I thought I saw a sql statement 
someplace that did that.

I'm not sure what it actually means, but this row would make me worry:

INFO:  "T": found 0 removable, 1444 nonremovable row versions in 1522 
out of 299964 pages
DETAIL:  30 dead row versions cannot be removed yet.

Hopefully somebody else is reading along that can offer insight.

Have you always had this problem, or is it something new?  Or is 
"select count..." new?

Also, not sure if you have seen it elsewhere, but "select count"'ing 
an entire table is probably one of PG slowest operations.  If you 
really need the count you might consider a separate summary table that 
stores the count.

-Andy

Andy,

Thanks for the discussion.  The name has more to do with attribute 
grammars, iirc.  Anyway, I'm sure the table is quite bloated.  I tried a 
query yesterday from somewhere on the internet that claimed to report 
the amount of bloat, but it reported 100.0% bloat, which seems 
suspicious given that there is some measurable payload in there.  Still, 
probably not far from the truth.

The count(id) query is an example.  The application almost never does 
sequential scans of this table... except when the server is starting and 
for one user query that displays a significant portion of the table.  
I'd rather avoid creating even more indices to avoid the sequential scan 
-- these indices would interfere with HOT updates.

I've made a schema change that moves a large, immutable text column out 
of this table.  With this change, the remaining columns are simple 
scalars (integer, float, timestamp) with the exception of a username and 
file path.  I'd be pretty shocked if any tuple had more than a kilobyte 
of payload now.  It eliminates use of TOAST... I'll be clustering and I 
guess we'll see what happens over the next year or so.

- Dave

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general