Re: Performance of count(*)

Steve Atkins <steve@xxxxxxxxxxx> · Thu, 22 Mar 2007 11:37:19 -0700

On Mar 22, 2007, at 11:26 AM, Guido Neitzer wrote:

On 22.03.2007, at 11:53, Steve Atkins wrote:

As long as you're ordering by some row in the table then you can  
do that in
straight SQL.

select a, b, ts from foo where (stuff) and foo > X order by foo  
limit 10

Then, record the last value of foo you read, and plug it in as X  
the next
time around.

This does only work if you have unique values in foo. You might  
have "batch breaks" inside   a list of rows with equal values for foo.

If I don't have unique values in foo, I certainly have unique values  
in (foo, pk).

But: a web application that needs state and doesn't maintain it by  
itself (or inside the dev toolkit) is imho broken by design. How  
should the database store a "state" for a web app? It's only  
possible on the web app part, because the app is either stateless  
and so are the queries to the database - they have to be re- 
evaluated for every request as the request might come from totally  
different sources (users, ...) or it is stateful and has to  
maintain the state because only the app developer knows, what  
information is needed for the "current state".

This is why all web application toolkits have a "session" concept.

Yes. HTTP is not very stateful. Web applications are stateful. There  
are some really obvious approaches to maintaining state cleanly that  
work well with databases and let you do some quite complex stuff  
(tying a persistent database connection to a single user, for  
instance). But they don't scale at all well.

What Craig was suggesting is, basically, to assign a persistent  
database connection to each user. But rather than maintain that  
connection as a running process, to serialise all the state out of  
the database connection and store that in the webapp, then when the  
next action from that user comes in take a database connection and  
stuff all that state into it again.

It's a lovely idea, but strikes me as completely infeasible in the  
general case. There's just too much state there. Doing it in the  
specific case is certainly possible, but rapidly devolves to the  
standard approach of "On the first page of results, run the query and  
record the first 5000 results. Store those in a scratch table,  
indexed by session-id, or in external storage. On displaying later  
pages of results to the same user, pull directly from the already  
calculated results."

I think the problem is more that most web developers aren't very good
at using the database, and tend to fall back on simplistic, wrong,  
approaches
to displaying the data. There's a lot of monkey-see, monkey-do in web
UI design too, which doesn't help.

Sure. That is the other problem ... ;-) But, and I think this is  
much more important: most toolkits today free you from using the  
database directly and writing lots and lots of lines of sql code  
which instantly breaks when you switch the storage backend. It's  
just the thing from where you look at something.

The real problem is the user-interface is designed around what is  
easy to implement in elderly cgi scripts, rather than what's  
appropriate to the data being displayed or useful to the user.  
Displaying tables of results, ten at a time, is just one of the more  
egregious examples of that.

Cheers,
  Steve