Tom Lane <tgl@xxxxxxxxxxxxx> writes: > Alvaro Herrera <alvherre@xxxxxxxxxxxxxxxxx> writes: > > Gregory Stark wrote: > >> (Incidentally I'm not sure where 2-5x comes from. It's entirely dependant on > >> your data distribution. It's not hard to come up with distributions where it's > >> 1000x as fast and others where there's no speed difference.) > > > So the figure is really "1-1000x"? I bet this one is more impressive in > > PHB terms. > > Luke has a bad habit of quoting numbers that are obviously derived from > narrow benchmarking scenarios as Universal Truths, rather than providing > the context they were derived in. I wish he'd stop doing that... In fairness I may have exaggerated a bit there. There is a limit to how much of a speedup you can get in valid benchmarking situations. A single sequential scan is always going to be necessary so you're only saving the cost of writing out the temporary file and subsequent merge passes. It's hard to generate lots of intermediate merge passes since there are only O(log(n)) of them. So to get 1000x speedup on a large I/O bound sort you would have to be sorting something on order of 2^1000 records which is ridiculous. Realistically you should only be able to save 2-5 intermediate merge passes. On the other there are some common situations where you could see atypical increases. Consider joining a bunch of small tables to generate a large result set. The small tables are probably all in memory and the result set may only have a small number of distinct values. If you throw out the duplicates early you save *all* the I/O. If you have to do a disk sort it could be many orders slower. This is actually not an uncommon coding idiom for MySQL programmers accustomed to fast DISTINCT working around the lack of subqueries and poor performance of IN and EXISTS. They often just join together all the tables in a big cross join and then toss in a DISTINCT at the top to get rid of the duplicates. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com