On Wed, 9 Jan 2008 10:30:45 -0600 "Scott Marlowe" <scott.marlowe@xxxxxxxxx> wrote: > Now, everything's a tradeoff. If PostgreSQL had visibility > information in the indexes, it would have to lock both the table and > index for every write, thus slowing down all the other queries that > are trying to access the table. It would be a tradeoff that > sacrificed write speed for read speed. In a db that was used mostly > for writing, it would likely be a fair trade. In a db that did a > lot of writing, it might slow the whole thing to a crawl. OK... we are getting near to the point. I understand the trade-off problem in storing into indexes id the row is still there. Is there a way to get the count of the rows that *may be* there, I mean the one that satisfy the where clauses no matter if in the meanwhile they were deleted or not and reindex on a regular basis. That would be simple and I think fast and an acceptable solution for most CMS. > Now, if there's a where clause that's selective enough, then a > count(*) query may use the index and be a bit faster, but remember, > in pgsql, it's still got to actually hit the table to see if each > tuple really is visible to this transaction, so the index needs to > be fairly selective to be a win. But well if the query is not selective enough I think the problem is shared with other DB as well. > A possible workaround is to have something like a separate table > with nothing but the IDs and whatever would be in your where clause > for the tables you're accessing with a foreign key to it, and use > THAT for a count(*). Since the rows are skinnier, the count(*) > will be faster. Another alternative is to have a trigger fire that > keeps a track of the size of the table in a summary table when rows > are added and deleted. If the where clause is dynamic, how can it help? > select count(*) from bigtable where inserttime > now() - interval > '1 week'; count > --------- > 1254269 > (1 row) > > Time: 21422.368 ms > > Second run: > select count(*) from bigtable where inserttime > now() - interval > '1 week'; count > --------- > 1254320 > (1 row) > > Time: 2001.204 ms > > With the data loaded into shared_buffers / linux kernel cache, > that's not too bad. > > Now, I run the same query against our production oracle machine, > which is a MUCH more powerful server... > > SQL> select count(*) from bigtable where inserttime > SYSDATE-7; > > COUNT(*) > ---------- > 1255972 > > Elapsed: 00:00:18.62 > > second run: > SQL> select count(*) from bigtable where inserttime > SYSDATE-7; > > COUNT(*) > ---------- > 1255973 > Elapsed: 00:00:00.98 > > Now, obviously, Oracle's got some optimizations for what it's got in > the buffer there, but the first run isn't really any faster. In > fact, for a much more powerful machine, the performance was, > relatively speaking, pretty bad compared to my little 1 CPU 1 sw > RAID-10 reporting server. Interesting... oh let's put some emotions in it: cool ;) > So, while PostgreSQL's count(*) performance isn't blindingly fast, > it's not the dog some people make it out to be either. Continuing here from your previous post... while you guys are concerned of competing with big boxes running Oracle or thinking about Google's webfarms I'm thinking about stuff that is in between a mere CMS for mortals and stuff that deals with money where you don't have budgets and hits enough to justify a complex logic to do stats analysis or collect enough stats to make any forecast reasonable. In this context a simpler faster even if less accurate count may be very handy. thanks again -- Ivan Sergio Borgonovo http://www.webthatworks.it ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org/