Search Postgresql Archives

Re: strategies for dealing with frequently updated tables

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/28/2012 5:21 PM, Andy Colson wrote:
On 2/28/2012 2:09 PM, Dave Vitek wrote:
The following query takes about 100 minutes (3 seconds per tuple):
SELECT count(id) from T

so table T has 18,000 rows? (100 * 60 * 3)
100 minutes * 60 (sec/min) / 3 (sec/tuple) = 2000 tuples

Well... math was never my strong point :-)

So you have a large row, who knows how many, and the problem is
"SELECT count(id) from T" is slow?
Correct. I think anything requiring a sequential scan of the table takes
100 minutes to run. However, I wanted to be careful about drawing
conclusions and just present data.


-Andy
ps: based on you're email gramma, I almost changed all your to you're,
just to see if it annoyed you. Sounds like you work for grammar
technology.
How embarrassing! After writing code all day I tend to have some natural
language issues.

No, no, that's not what I meant. Your email address grammatech.com, makes it sound like you are doing grammar technology.

Have you checked for table bloat? I thought I saw a sql statement someplace that did that.

I'm not sure what it actually means, but this row would make me worry:

INFO: "T": found 0 removable, 1444 nonremovable row versions in 1522 out of 299964 pages
DETAIL:  30 dead row versions cannot be removed yet.

Hopefully somebody else is reading along that can offer insight.

Have you always had this problem, or is it something new? Or is "select count..." new?

Also, not sure if you have seen it elsewhere, but "select count"'ing an entire table is probably one of PG slowest operations. If you really need the count you might consider a separate summary table that stores the count.

-Andy


Andy,

Thanks for the discussion. The name has more to do with attribute grammars, iirc. Anyway, I'm sure the table is quite bloated. I tried a query yesterday from somewhere on the internet that claimed to report the amount of bloat, but it reported 100.0% bloat, which seems suspicious given that there is some measurable payload in there. Still, probably not far from the truth.

The count(id) query is an example. The application almost never does sequential scans of this table... except when the server is starting and for one user query that displays a significant portion of the table. I'd rather avoid creating even more indices to avoid the sequential scan -- these indices would interfere with HOT updates.

I've made a schema change that moves a large, immutable text column out of this table. With this change, the remaining columns are simple scalars (integer, float, timestamp) with the exception of a username and file path. I'd be pretty shocked if any tuple had more than a kilobyte of payload now. It eliminates use of TOAST... I'll be clustering and I guess we'll see what happens over the next year or so.

- Dave

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux