Re: Geometric types row estimation

Igor ALBUQUERQUE SILVA <i.albuquerque-silva@xxxxxxxxxxx> · Wed, 30 Nov 2022 18:38:16 +0100

Hi Tom,
Thanks a lot for the explanation, I thought the built-in types were more standard, so I didn't mention that I was having the same thing using postgis. Here's the example (I changed the values a little bit to avoid rounding errors):

create table test(p geometry(point));
insert into test(p) values (st_makepoint(0,0));
insert into test(p) values (st_makepoint(0,1));
insert into test(p) values (st_makepoint(1,0));
insert into test(p) values (st_makepoint(1,1));
insert into test(p) values (st_makepoint(50,0));
analyze test;
explain analyze select * from test where ST_Contains(ST_GeomFromText('POLYGON((-1 -1,2 -1,2 2,-1 2,-1 -1))'), p);
explain analyze select * from test where ST_Contains(ST_GeomFromText('POLYGON((49 -1,51 -1,51 1,49 1,49 -1))'), p);

EXPLAIN ANALYZE:

 Seq Scan on test  (cost=0.00..126.05 rows=1 width=32) (actual time=0.015..0.022 rows=4 loops=1)
   Filter: st_contains('01030000000100000005000000000000000000F0BF000000000000F0BF0000000000000040000000000000F0BF00000000000000400000000000000040000000000000F0BF0000000000000040000000000000F0BF000000000000F0BF'::geometry, p)
   Rows Removed by Filter: 1
 Planning Time: 0.072 ms
 Execution Time: 0.035 ms
(5 rows)

Do you know if the functions in Postgis are also stubbed? Or maybe I'm doing something wrong with the syntax?

This time I'm using the postgis docker image, PostgreSQL 15.1 (Debian 15.1-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit

Best regards,
Igor

On Wed, 30 Nov 2022 at 18:18, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Igor ALBUQUERQUE SILVA <i.albuquerque-silva@xxxxxxxxxxx> writes:

> I'm having a problem regarding the point type/gist indexes. Here's a

> minimal reproduction of it:

> ...

> What I was expecting is the first query to estimate 4 rows and the second

> to estimate 1, like what I get If I try the same thing using integers.

Unfortunately, the selectivity estimation functions for PG's geometric

types are mostly just stubs.  The estimation function for point <@ box

in particular is contsel [1]:

/*

 *      contsel -- How likely is a box to contain (be contained by) a given box?

 *

 * This is a tighter constraint than "overlap", so produce a smaller

 * estimate than areasel does.

 */

Datum

contsel(PG_FUNCTION_ARGS)

{

        PG_RETURN_FLOAT8(0.001);

}

It's been like that (excepting notational changes) since Berkeley days,

because nobody has bothered to make it better.

In general, PG's built-in geometric types have never gotten much

beyond their origins as an academic proof-of-concept.  I think people

who are doing serious work that requires such operations mostly use

PostGIS, and I'd suggest looking into that.

Or, if you feel like doing a lot of work to make these estimators

better, have at it.

                        regards, tom lane

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob_plain;f=src/backend/utils/adt/geo_selfuncs.c;hb=HEAD