On Fri, 29 May 2015 13:37:24 -0500 David Haynes II <dahaynes@xxxxxxx> wrote: > > Let me explain this a bit more clearly. The dataset (table) that we are > using in this analysis is produced from a satellite image (tif) called > MODIS. This image can be placed into PostgreSQL using the raster2pgsql > command (http://postgis.net/docs/using_raster_dataman.html). The > raster2pgsql command reads the tif image and creates a table that can be > used with the PostGIS extensions. There is no difference between the > modis_noout and modis, what has happened is that we have used the > raster2pgsl -R flag in the conversion process. The -R flag creates metadata > for the raster file. For example, the modis dataset as tif is 46 GB, the > same dataset stored as table within the database is 1.74 GB and the using > outdb the metadata table is 52 MB. What is the output of: SELECT pg_total_relation_size('modis') AS "modis", pg_total_relation_size('modis_noout') AS "modis_noout"; on both indb and outdb? > We have seen a decrease in performance time when using OutDB, keep in mind > that the schema and indices on the tables are exactly the same. > > CREATE TABLE modis ( > rid integer NOT NULL, > rast public.raster, > filename text, > CONSTRAINT enforce_height_rast CHECK ((public.st_height(rast) = 250)), > CONSTRAINT enforce_nodata_values_rast CHECK > (((public._raster_constraint_nodata_values(rast))::numeric(16,10)[] = > '{NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL}'::numeric(16,10)[])), > CONSTRAINT enforce_num_bands_rast CHECK ((public.st_numbands(rast) = > 12)), > CONSTRAINT enforce_out_db_rast CHECK > ((public._raster_constraint_out_db(rast) = > '{t,t,t,t,t,t,t,t,t,t,t,t}'::boolean[])), > CONSTRAINT enforce_pixel_types_rast CHECK > ((public._raster_constraint_pixel_types(rast) = > '{8BUI,8BUI,8BUI,8BUI,8BUI,8BUI,8BUI,8BUI,8BUI,8BUI,8BUI,8BUI}'::text[])), > CONSTRAINT enforce_same_alignment_rast CHECK > (public.st_samealignment(rast, > '01000000002EC50BE300F57C4026C50BE300F57CC0E7FBA955801673C13202AA558016634100000000000000000000000000000000BA1A000001000100'::public.raster) > ), > CONSTRAINT enforce_scalex_rast CHECK > (((public.st_scalex(rast))::numeric(16,10) = > 463.312716527917::numeric(16,10))), > CONSTRAINT enforce_scaley_rast CHECK > (((public.st_scaley(rast))::numeric(16,10) = > (-463.312716527917)::numeric(16,10))), > CONSTRAINT enforce_srid_rast CHECK ((public.st_srid(rast) = 6842)), > CONSTRAINT enforce_width_rast CHECK ((public.st_width(rast) = 250)) > ); > CREATE SEQUENCE modis_rid_seq > START WITH 1 > INCREMENT BY 1 > NO MINVALUE > NO MAXVALUE > CACHE 1; > ALTER TABLE ONLY modis ALTER COLUMN rid SET DEFAULT > nextval('modis_rid_seq'::regclass); > ALTER TABLE ONLY modis ADD CONSTRAINT modis_pkey PRIMARY KEY (rid); > CREATE INDEX modis_rast_gist ON modis USING gist > (public.st_convexhull(rast)); > > > Here is the explain analyze > > InDB > Sort (cost=69547.29..69562.80 rows=6204 width=254) (actual > time=131042.478..131042.877 rows=7612 loops=1) > Sort Key: r.id, r.name > Sort Method: quicksort Memory: 787kB > CTE poly > -> Seq Scan on us_counties (cost=0.00..112.86 rows=3109 width=62247) > (actual time=3.008..8135.679 rows=3109 loops=1) > CTE rast_select > -> Nested Loop (cost=0.28..51767.41 rows=62033 width=272) (actual > time=170.799..128606.266 rows=7677 loops=1) > -> CTE Scan on poly s (cost=0.00..62.18 rows=3109 width=250) > (actual time=3.058..8212.030 rows=3109 loops=1) > -> Index Scan using modis_noout_rast_gist on modis_noout r_1 > (cost=0.28..16.56 rows=2 width=22) (actual time=4.329..11.244 rows=2 > loops=3109) > Index Cond: ((rast)::geometry && s.geom) > Filter: _st_intersects(s.geom, rast, NULL::integer) > Rows Removed by Filter: 0 > -> HashAggregate (cost=17214.16..17276.20 rows=6204 width=254) (actual > time=131033.902..131036.105 rows=7612 loops=1) > -> CTE Scan on rast_select r (cost=0.00..1240.66 rows=62033 > width=254) (actual time=170.814..128659.842 rows=7677 loops=1) > Total runtime: 131414.752 ms > > OutDB > Sort (cost=93911.29..93926.80 rows=6204 width=254) (actual > time=866326.762..866327.148 rows=7612 loops=1) > Sort Key: r.id, r.name > Sort Method: quicksort Memory: 787kB > CTE poly > -> Seq Scan on us_counties (cost=0.00..112.86 rows=3109 width=62247) > (actual time=1.327..6954.020 rows=3109 loops=1) > CTE rast_select > -> Nested Loop (cost=0.28..76131.41 rows=62033 width=1086) (actual > time=257.610..863474.778 rows=7677 loops=1) > -> CTE Scan on poly s (cost=0.00..62.18 rows=3109 width=250) > (actual time=1.341..7030.138 rows=3109 loops=1) > -> Index Scan using modis_rast_gist on modis r_1 > (cost=0.28..24.40 rows=2 width=836) (actual time=1.481..3.952 rows=2 > loops=3109) > Index Cond: ((rast)::geometry && s.geom) > Filter: _st_intersects(s.geom, rast, NULL::integer) > Rows Removed by Filter: 0 > -> HashAggregate (cost=17214.16..17276.20 rows=6204 width=254) (actual > time=866317.923..866320.316 rows=7612 loops=1) > -> CTE Scan on rast_select r (cost=0.00..1240.66 rows=62033 > width=254) (actual time=257.625..863555.082 rows=7677 loops=1) > Total runtime: 866691.113 ms > > On Fri, May 29, 2015 at 6:40 AM, PT <wmoran@xxxxxxxxxxxxxxxxx> wrote: > > > On Thu, 28 May 2015 10:06:24 -0500 > > David Haynes II <dahaynes@xxxxxxx> wrote: > > > > > > The query run times are significantly slower on outdb as that using indb > > > here are the run times on 2 queries. > > > > > > ST_Count/ST_Clip(select single band here)/Inner Join/ST_Transform (US > > > Counties) > > > OutDB: 873.564s (14 minutes 33s) InDB: 127.36s (2 minutes 7s) > > > > > > ST_Count(select single band here)/ST_Clip(on all bands)/Inner > > > Join/ST_Transform (US Counties) > > > OutDB: 9537.371s (2 hours 38minutes) InDB: 310s (5 minutes 10 seconds) > > > > > > In the query planner it shows a large change in the number of columns > > > (width) that are picked up in the CTE_rast_select. > > > These extra columns slow down the ability to process the data. > > > > As I mentioned before, that's not how that works. Those are estimates. If > > those rows are actually different between the two tables, then your data > > is different between the two databases and you can't expect the performance > > to be the same. > > > > Additionally, the part you snip out below isn't the part that's different. > > In particular, the difference is coming from the fact that one of the > > plans uses modis and the other uses modis_noout. > > > > Does modis exist in indb? Does modis_noout exist on outdb? What is the > > difference between these two tables? Because _that_ is where the time > > difference is most likely happening (based on the explain output). > > > > Additionally, run EXPLAIN ANALYZE on these queries to get the actual > > times in addition to the estimates. Furthermore, given that a lot of > > the confusion in this question is due to a lack of iformation, it would > > be a good idea to include the table definitions. > > > > > OUT DB > > > CTE rast_select > > > > -> Nested Loop (cost=0.28..76131.41 rows=62033 *width=1086)* > > > > -> CTE Scan on poly s (cost=0.00..62.18 rows=3109 > > width=250) > > > > > > > > > In DB > > > Nested Loop (cost=0.28..51767.41 rows=62033 *width=272*) > > > > -> CTE Scan on poly s (cost=0.00..62.18 rows=3109 > > width=250) > > > > > > On Wed, May 27, 2015 at 4:31 PM, PT <wmoran@xxxxxxxxxxxxxxxxx> wrote: > > > > > > > On Tue, 26 May 2015 12:52:24 -0500 > > > > David Haynes II <dahaynes@xxxxxxx> wrote: > > > > > > > > > Hello, > > > > > > > > > > I have a question about the query optimizer and its performance on > > > > spatial > > > > > datasets, specifically rasters. My use case is rather unique, the > > > > > application that I am developing allows users to request > > summarizations > > > > of > > > > > various geographic boundaries around the world. Therefore our raster > > > > > datasets are global. We are in the process of conducting some > > benchmarks > > > > > for our system and we noticed something unexpected. > > > > > > > > > > The query is the same except the first is run on a raster (46gigs) > > in out > > > > > of database (outdb) and the second is the same raster (46gigs) > > stored in > > > > > database (indb). The raster is multibanded (13), with each band > > > > > representing one entire MODIS global scene. A single year of MODIS is > > > > > approximately 3.6 gigs. > > > > > > > > > > The outdb is being out performed by indb, because the query optimizer > > > > gets > > > > > smarter. But what is also interesting is all the extra pieces that > > are > > > > > brought in with outdb. > > > > > > > > > > with poly as > > > > > ( SELECT gid, label as name, ST_Transform(geom, 6842) as geom FROM > > > > > us_counties ) > > > > > , rast_select as > > > > > ( SELECT r.rid as id, s.name , ST_CLIP(r.rast, s.geom) as rast from > > > > > rasters.glc2000 as r inner join poly as s on ST_Intersects(r.rast, > > > > s.geom) ) > > > > > select r.id, r.name, ST_Count(r.rast, 1, True) > > > > > > > > > > > > > > > QUERY PLAN With Outdb > > > > > > > > > > > -------------------------------------------------------------------------------------------------- > > > > > Sort (cost=93911.29..93926.80 rows=6204 width=254) > > > > > Sort Key: r.id, r.name > > > > > CTE poly > > > > > -> Seq Scan on us_counties (cost=0.00..112.86 rows=3109 > > > > width=62247) > > > > > CTE rast_select > > > > > -> Nested Loop (cost=0.28..76131.41 rows=62033 width=1086) > > > > > -> CTE Scan on poly s (cost=0.00..62.18 rows=3109 > > width=250) > > > > > -> Index Scan using modis_rast_gist on modis r_1 > > > > > (cost=0.28..24.40 rows=2 width=836) > > > > > Index Cond: ((rast)::geometry && s.geom) > > > > > Filter: _st_intersects(s.geom, rast, NULL::integer) > > > > > -> HashAggregate (cost=17214.16..17276.20 rows=6204 width=254) > > > > > -> CTE Scan on rast_select r (cost=0.00..1240.66 rows=62033 > > > > > width=254) > > > > > > > > > > QUERY PLAN With Indb > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------------------------------------- > > > > > Sort (cost=69547.29..69562.80 rows=6204 width=254) > > > > > Sort Key: r.id, r.name > > > > > CTE poly > > > > > -> Seq Scan on us_counties (cost=0.00..112.86 rows=3109 > > > > width=62247) > > > > > CTE rast_select > > > > > -> Nested Loop (cost=0.28..51767.41 rows=62033 width=272) > > > > > -> CTE Scan on poly s (cost=0.00..62.18 rows=3109 > > width=250) > > > > > -> Index Scan using modis_noout_rast_gist on modis_noout > > r_1 > > > > > (cost=0.28..16.56 rows=2 width=22) > > > > > Index Cond: ((rast)::geometry && s.geom) > > > > > Filter: _st_intersects(s.geom, rast, NULL::integer) > > > > > -> HashAggregate (cost=17214.16..17276.20 rows=6204 width=254) > > > > > -> CTE Scan on rast_select r (cost=0.00..1240.66 rows=62033 > > > > > width=254) > > > > > > > > I could be missing something here, but I don't see how long the queries > > > > actually take to run. Have you actually run the queries and timed them? > > > > Keep in mind that analyze does not actually run the query, it only > > plans > > > > it, so the actual run time is unknown if all you do is analyze. > > > > > > > > The query plans appear to be equal, assuming there are slight variances > > > > in the names of tables from one DB to another (and I assume that your > > > > description of indb and outdb reflects the fact that there are (for > > > > reasons unknown) two copies of the data). > > > > > > > > The only purpose to those estimates is to choose a good plan. If the > > > > plan is bad for one database and both databases have the same data, > > then > > > > the plan will be bad for both. > > > > > > > > Since there have been no other responses, I'm guessing that others are > > > > confused by your question as well. Can you describe the actual problem > > > > that you're seeing? > > > > > > > > -- > > > > Bill Moran <wmoran@xxxxxxxxxxxxxxxxx> > > > > > > > > > > > > -- > > > > Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) > > > > To make changes to your subscription: > > > > http://www.postgresql.org/mailpref/pgsql-general > > > > > > > > > > > > > > > > -- > > > David Haynes, Ph.D. > > > Research Associate Terra Populus > > > Minnesota Population Center > > > www.terrapop.org > > > > > > -- > > Bill Moran <wmoran@xxxxxxxxxxxxxxxxx> > > > > > > -- > David Haynes, Ph.D. > Research Associate Terra Populus > Minnesota Population Center > www.terrapop.org -- Bill Moran <wmoran@xxxxxxxxxxxxxxxxx> -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general