I have the following query: SELECT read_date, to_char(read_date,'YYYY-MM-DD') as text_date, CASE WHEN lat_2>lat_1 THEN ns_dist*-1 ELSE ns_dist END as ns_dist, CASE WHEN lon_2>lon_1 THEN ew_dist*-1 ELSE ew_dist END as ew_dist, CASE WHEN alt_2>alt_1 THEN (ud_dist*-1)::float ELSE ud_dist::float END as ud_dist, lat_e/10::float as lat_e, lon_e/10::float as lon_e, alt_e/10::float as alt_e, CASE WHEN rapid THEN 1 ELSE 0 END as rapid FROM (SELECT g1.read_date as read_date, g1.rapid as rapid, g1.lat as lat_1, g1.lon as lon_1, g1.alt as alt_1, g2.lat as lat_2, g2.lon as lon_2, g2.alt as alt_2, ST_Distance(ST_SetSRID( ST_Point( g1.lon, g1.lat), 4326)::geography, ST_SetSRID( ST_Point( g1.lon, g2.lat), 4326)::geography)/1000 as ns_dist, ST_Distance(ST_SetSRID( ST_Point( g1.lon, g1.lat), 4326)::geography, ST_SetSRID( ST_Point( g2.lon, g1.lat), 4326)::geography)/1000 as ew_dist, g1.alt-g2.alt ud_dist, sqrt(g1.lat_e^2+g2.lat_e^2) as lat_e, sqrt(g1.lon_e^2+g2.lon_e^2) as lon_e, sqrt(g1.alt_e^2+g2.alt_e^2) as alt_e FROM gps_data g1 INNER JOIN gps_data g2 ON g2.read_date=g1.read_date WHERE g1.station=%s AND g2.station=%s ) g3 ORDER BY read_date; Which, when executed on my system for a particular par of g1.station and g2.station takes about 190ms to run (explain output here: https://explain.depesz.com/s/8Nf9 ). This is not horrible, but when running this query for a dozen stations, it becomes noticeable. As such, I was wondering if I could do better. Looking at that explain output, I noticed that the largest time sink was the Merge Join at 114ms. Some research into that gave me the command "set enable_mergejoin=off” which I tried just for fun to see what would happen. Somewhat to my surprise, this actually shaved a good 80ms off the total run time (see https://explain.depesz.com/s/kpncZ). Looking at that, I see it does a Parallel Hash Join, which looks like it should be slower, but I guess since it can do it in parallel, it winds up faster? Not sure on that, but the overall time is definitely significantly faster. Which leaves me with a couple of questions: 1) Since not using a merge join is, in this case apparently better, is there some way I can tweak the query/settings such that it knows this without having to force the merge join off? 2) If not, are there any side effects/gotchas I should be aware of if I simply call that set command before each query (or, presumably before the series of queries)? 3) Is there anything else I can do to optimize the query? I have tried using a pre-calculated point column rather than calculating ST_Point( g1.lon, g1.lat), as well as providing “false” as the last argument to ST_Distance, but while these did speed things up some, it was only about 10ms or so, so not terribly significant. If it helps, the goal behind the somewhat odd arguments to ST_Distance is to find the component North/South and East/West distances, rather than the actual total “point a to point b” distance. Perhaps there is a better way of obtaining that goal? --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 |