On 8/16/06, Peter Hardman <peter@xxxxxxxxxxxxxxxxx> wrote:
I'm in the process of migrating a Paradox 7/BDE 5.01 database from single-user Paradox to a web based interface to either MySQL or PostgreSQL. The database is a pedigree sheep breed society database recording sheep and flocks (amongst other things). My current problem is with one table and an associated query which takes 10 times longer to execute on PostgreSQL than BDE, which in turn takes 10 times longer than MySQL. The table links sheep to flocks and is created as follows: CREATE TABLE SHEEP_FLOCK ( regn_no varchar(7) NOT NULL, flock_no varchar(6) NOT NULL, transfer_date date NOT NULL, last_changed date NOT NULL, CONSTRAINT SHEEP_FLOCK_pkey PRIMARY KEY (regn_no, flock_no, transfer_date) ) WITHOUT OIDS; ALTER TABLE SHEEP_FLOCK OWNER TO postgres; I then populate the table with COPY SHEEP_FLOCK FROM 'e:/ssbg/devt/devt/export_data/sheep_flock.txt' WITH CSV HEADER The table then has about 82000 records The query I run is: /* Select all sheep who's most recent transfer was into the subject flock */ SELECT DISTINCT f1.regn_no, f1.transfer_date as date_in FROM SHEEP_FLOCK f1 JOIN /* The last transfer date for each sheep */ (SELECT f.regn_no, MAX(f.transfer_date) as last_xfer_date FROM SHEEP_FLOCK f GROUP BY f.regn_no) f2 ON f1.regn_no = f2.regn_no WHERE f1.flock_no = '1359' AND f1.transfer_date = f2.last_xfer_date The sub-select on it's own returns about 32000 rows. Using identically structured tables and the same primary key, if I run this on Paradox/BDE it takes about 120ms, on MySQL (5.0.24, local server) about 3ms, and on PostgresSQL (8.1.3, local server) about 1290ms). All on the same Windows XP Pro machine with 512MB ram of which nearly half is free. The query plan shows most of the time is spent sorting the 30000+ rows from the subquery, so I added a further subquery as follows: /* Select all sheep who's most recent transfer was into the subject flock */ SELECT DISTINCT f1.regn_no, f1.transfer_date as date_in FROM SHEEP_FLOCK f1 JOIN /* The last transfer date for each sheep */ (SELECT f.regn_no, MAX(f.transfer_date) as last_xfer_date FROM SHEEP_FLOCK f WHERE f.regn_no IN /* Limit the rows extracted by the outer sub-query to those relevant to the subject flock */ /* This typically reduces the time from 1297ms to 47ms - from 35000 rows to 127 rows */ (SELECT s.regn_no FROM SHEEP_FLOCK s where s.flock_no = '1359') GROUP BY f.regn_no) f2 ON f1.regn_no = f2.regn_no WHERE f1.flock_no = '1359' AND f1.transfer_date = f2.last_xfer_date then as the comment suggests I get a considerable improvement, but it's still an order of magnitude slower than MySQL. Can anyone suggest why PostgreSQL performs the original query so much slower than even BDE?
ANALYZE? Regards, Rodrigo