--- Scott Marlowe <scott.marlowe@xxxxxxxxx> wrote: > On Nov 9, 2007 11:47 AM, Ted Byers > <r.ted.byers@xxxxxxxxxx> wrote: > > > > --- Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > > > > > Ted Byers <r.ted.byers@xxxxxxxxxx> writes: > [snip] > Which is better depends largely on how your database > is built. MySQL > still uses loops for all subselects, so with large > numbers of tuples > in the subselect method, it will be slow. But they > might fix this in > a later release. > > Fairly recent versions of PostgreSQL could make some > bad choices when > doing joins for certain datasets that would be much > faster with a > correlated subquery (specifically the old left join > where > righttable.field is null trick made some pgsql > versions choose an > inefficient join method) > > So, the "right" way is a question of which db, and > even which version > of that DB you're on. > My life just got soooooooo much more complicated. Oh well, I guess that will be useful when providing advice to management when they start seeing performance issues. Thanks ;-) Have you looked at version 5.0.45? I am always seeing the claim that the left join trick is so much more faster than the correlated subquery, especially if a function like MAX() is used, but the numbers I am seeing with real stock price data has it running, on average, about three times faster than the left join. So I assumed I was doing something wrong in a manner than would get me the right answer the slowest way possible. > > At this time, the database in use is > > irrelevant (I want to stick as close to the ANSI > > standard as practicable so the rewriting required > will > > be minimal should we decide to change the database > > later, for whatever reason). > > If you want to stick with ANSI, MySQL tends to be > more divergent from > the spec than pgsql and other rdbms. > The books I am using, which describe the SQL language, don't seem to mention or illustrate much difference among any of the rdbms (including my references that talk about Oracle and MS SQL Server). The SQL I try from those books seem to work reasonably well in all of them (I can't check against Oracle, though, since I don't have that), and I try most of my SQL against MySQL, Postgres and MS SQL Server (the biggest divergences seem to be in how bulk loading of data happens). Maybe I haven't explored enough of the SQL language, with large enough datasets, to see the differences you mention; or perhaps things are improving with all of them. > Most people would consider the correlate subquery > the better method. > But it's also likely to be the slowest on MySQL. > Right now, with this particular query the correlated subquery is the one that gets me the right answers about 3 times faster than any other method I have tried. But it still takes several minutes to get the results for only a few dozen stocks. And yet I can get several megabytes of data from the following query in about a quarter of the time. SELECT price_date,close_price FROM stockprices A WHERE A.stock_id = 1 AND price_date IN (SELECT MAX(B.price_date) FROM stockprices B WHERE A.stock_id = B.stock_id AND A.stock_id = 1 GROUP BY YEAR(B.price_date),WEEKOFYEAR(B.price_date)); I had expected this to take many times longer than the "simple" select that gets only the last price for a given stock, but I was surprised to see it so much faster than the query that gets just the last prices for only a couple dozen stocks. Thanks alot. I learned alot from your reply. Ted ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your message can get through to the mailing list cleanly