Re: Fwd: Raster performance

PT <wmoran@xxxxxxxxxxxxxxxxx> · Wed, 27 May 2015 17:31:40 -0400

On Tue, 26 May 2015 12:52:24 -0500
David Haynes II <dahaynes@xxxxxxx> wrote:

> Hello,
> 
> I have a question about the query optimizer and its performance on spatial
> datasets, specifically rasters. My use case is rather unique, the
> application that I am developing allows users to request summarizations of
> various geographic boundaries around the world. Therefore our raster
> datasets are global. We are in the process of conducting some benchmarks
> for our system and we noticed something unexpected.
> 
> The query is the same except the first is run on a raster (46gigs) in out
> of database (outdb) and the second is the same raster (46gigs) stored in
> database (indb). The raster is multibanded (13), with each band
> representing one entire MODIS global scene. A single year of MODIS is
> approximately 3.6 gigs.
> 
> The outdb is being out performed by indb, because the query optimizer gets
> smarter. But what is also interesting is all the extra pieces that are
> brought in with outdb.
> 
> with poly as
> ( SELECT gid, label as name, ST_Transform(geom, 6842) as geom  FROM
> us_counties )
> , rast_select as
> ( SELECT r.rid as id, s.name , ST_CLIP(r.rast, s.geom) as rast from
> rasters.glc2000 as r inner join poly as s on ST_Intersects(r.rast, s.geom) )
> select r.id, r.name, ST_Count(r.rast, 1, True)
> 
> 
>    QUERY PLAN With Outdb
> --------------------------------------------------------------------------------------------------
> Sort   (cost=93911.29..93926.80 rows=6204 width=254)
>   Sort Key: r.id, r.name
>   CTE poly
>     ->  Seq Scan on us_counties  (cost=0.00..112.86 rows=3109 width=62247)
>   CTE rast_select
>     ->  Nested Loop  (cost=0.28..76131.41 rows=62033 width=1086)
>           ->  CTE Scan on poly s  (cost=0.00..62.18 rows=3109 width=250)
>           ->  Index Scan using modis_rast_gist on modis r_1
>  (cost=0.28..24.40 rows=2 width=836)
>                 Index Cond: ((rast)::geometry && s.geom)
>                 Filter: _st_intersects(s.geom, rast, NULL::integer)
>   ->  HashAggregate  (cost=17214.16..17276.20 rows=6204 width=254)
>         ->  CTE Scan on rast_select r  (cost=0.00..1240.66 rows=62033
> width=254)
> 
>                                                 QUERY PLAN With Indb
> 
> -------------------------------------------------------------------------------------------------------------
> Sort   (cost=69547.29..69562.80 rows=6204 width=254)
>   Sort Key: r.id, r.name
>   CTE poly
>     ->  Seq Scan on us_counties  (cost=0.00..112.86 rows=3109 width=62247)
>   CTE rast_select
>     ->  Nested Loop  (cost=0.28..51767.41 rows=62033 width=272)
>           ->  CTE Scan on poly s  (cost=0.00..62.18 rows=3109 width=250)
>           ->  Index Scan using modis_noout_rast_gist on modis_noout r_1
>  (cost=0.28..16.56 rows=2 width=22)
>                 Index Cond: ((rast)::geometry && s.geom)
>                 Filter: _st_intersects(s.geom, rast, NULL::integer)
>   ->  HashAggregate  (cost=17214.16..17276.20 rows=6204 width=254)
>         ->  CTE Scan on rast_select r  (cost=0.00..1240.66 rows=62033
> width=254)

I could be missing something here, but I don't see how long the queries
actually take to run. Have you actually run the queries and timed them?
Keep in mind that analyze does not actually run the query, it only plans
it, so the actual run time is unknown if all you do is analyze.

The query plans appear to be equal, assuming there are slight variances
in the names of tables from one DB to another (and I assume that your
description of indb and outdb reflects the fact that there are (for
reasons unknown) two copies of the data).

The only purpose to those estimates is to choose a good plan. If the
plan is bad for one database and both databases have the same data, then
the plan will be bad for both.

Since there have been no other responses, I'm guessing that others are
confused by your question as well. Can you describe the actual problem
that you're seeing?

-- 
Bill Moran <wmoran@xxxxxxxxxxxxxxxxx>

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general