On Wed, 9 Jan 2008 13:04:39 +0100 "Harald Armin Massa" <haraldarminmassa@xxxxxxxxx> wrote: > Ivan, > > Please forgive my naiveness in this field but what does it mean an > > "exact count" and what other DB means with "an exact count" and > > how other DB deal with it? > PostgreSQL will give you an exact count of the contents of the > database as it is in the moment you begin your count. (i.e. the > transaction starts) > BUT as the table is not locked, in parallel somebody can bulkload > MANY items into the database, so at the moment (start of your > transaction) > + 1msec your count may be invalid allready. That's reasonable. What other DB do and what is the SQL standard requirement for count(*)? > > I'd expect it perform as good or better than other DB since now > > the bottleneck should be how efficiently it can filter records... > > but still a count(*) with a where clause will incur in the same > > problem of what "exact" means. > > I know of 3 concepts to answer count() faster then PostreSQL: > > 1) just lie, present an estimate > 2) do not have MVCC > 3) store "record deleted info" in index, so you can answer count() > with only scanning the index Sorry if I insist but I think this is a beaten selling point against postgresql. One of the most recurring argument about why someone else db is better than pg is count is slow. Who lies? If it is possible to do otherwise to have count run faster what are the trade off that make it unreasonable to implement it in pg? This is not very useful question but 3) imply that select scan the index return the rows and just later check if they are still there. Is it? And since indexes aren't updated "on the fly" you may get back a larger number than what is actually the real value. Let me consider an everyday use where count() looks as the most obvious solution: paging. I search trough a table and I need to know which is the last page. Concurrently someone is deleting a ton of records. No matter if count() is fast or not when I output the pager it will be "wrong". But still I'll need an estimate of the number of pages, it is not fun if that estimate is wrong *and* slow. And once you add the where clauses there is no way to cache the count. Is there a way to count based on indexes without taking into account deleted rows so to "count" faster? I can make the search faster using indexes as Dann Corbit suggested, but as you imply that won't make count as fast as the "concurrence"[1] that lie or don't use MVCC or store deleted info in indexes. SELECT reltuples FROM pg_class WHERE relname = <table_name>; doesn't apply since you can't add "conditions". Please be patient. I hear this over and over and over. Postgresql is faulty because it can't count fast. And the only reply I've seen are: it is bad design to use count, you're a dumb ass. I admit I may be a dumb ass, but it is hard to sell Postgres if I continue to be a dumb ass ;) - What count(*) really does? Now I understood that count(*) return the # of rows as it sees them at the moment it was invoked. That should be true for other DB as well. That means that unless other DB lock the table they can't take into account records that are going to be deleted once the count has been issued. - When count can be avoided? Well since even other DB may report the "wrong" number, this makes count() look less a Sacre Graal. But still if you need an estimate, wouldn't it be better to have it quickly? How does postgresql compare to other DB when you run: select count(*) from table where conditions once you use indexes? If such kind of query will have anyway to scan the results to see if they are still there since info about deleted records aren't stored in indexes, is there a way to ignore this and just have a faster estimate? I still can't see why it is bad design to use count(). - When count() can't be avoided? All the situation where you may really need count() I think you also need to lock the table but well I'd be curious to see an example where you need count() Still can somebody make an example of bad design and one where count() couldn't be avoided if any? Consider that while it makes few sense to rely on "wrong" numbers in a "business" environment where data integrity/coherence makes *a lot* of sense it is not so clear in a CMS world where most of those critics come from. I know that the arguments to promote postgres in the "business" world where DB2, Oracle and MS SQL play (?) may be different and count() may lose its importance in that context and you could say that other advantages plenty pay off the "slowness" of an operation that in such a context is rare. thanks [1] or does postgres perform as the concurrence once you add where clauses? -- Ivan Sergio Borgonovo http://www.webthatworks.it ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org/