On Tuesday 12 October 2010 14:35:01 you wrote: > > suggest that 99% instances of the "select count(*)" idiom are probably > > > >> bad use of the SQL language. > > Well, suppose you paginate results. If the user sees that the search query > returns 500 pages, there are two options : > > - you're google, and your sorting algorithms are so good that the answer > the user wants is in the first page > - or the user will refine his search by entering more keywords tu get a > manageable result set > > So, in both cases, the count(*) was useless anyway. And the slowest ones > are the most useless, since the user will immediatey discard the result > and refine his query. > > If your full text search is slow, try Xapian or Lucene. I guess I have to comment here again and point out that while I am having this issue with text searches, I avoid using count(*) in such cases, I just use next and previous links. Where the real problem (for me) is that when someone searches a date or time range. My application keeps track of huge amounts of realtime transactional data. So an administrator might want a report as to what some data point did yesterday between 3 and 4 PM. Under normal conditions the range of records that match can be between 0 and over 5,000. This is really killing me especially when the reporting people want a list of how many transactions each that were on points in a given zipcode had this morning between 8 and 9 AM, it takes about 5 minutes to run on a server that has enough ram to hold the entire table! Pseudo query: Show how many transactions per node in zipcode 92252 between 8:00 and 9:00 today: point_number | number_of_transactions 65889 | 31 34814 | 4865 28349 | 0 3358 | 364 ... 24 total rows, > 5 minutes. Then they want every node to be a link to a list of actual data within the specified timeframe. This is where I have to to the same query twice to first find out how many for the page links, then again to get a page of results. Sure, I could keep tables around that have numbers by the hour, minute, day or whatever to cache up results for speeding things, then the problem is that when the data is put into the server, there are so many statistics tables to update, the front end becomes a huge problem. Also, it makes for a huge mess of tables to think about when I need to make a report. -Neil- -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance