Re: Query never completes with low work_mem (at least not within one hour)

Merlin Moncure <mmoncure@xxxxxxxxx> · Tue, 4 Apr 2017 09:11:54 -0500

On Tue, Apr 4, 2017 at 8:20 AM, Daniel Westermann
<daniel.westermann@xxxxxxxxxxxxxxxx> wrote:
> Hi,
>
> PostgreSQL 9.6.2 on CentOS 7.3 x64.
>
> This is my data set:
>
> drop table if exists users;
> drop table if exists ids;
> create table users ( user_id int
>                    , username varchar(50)
>                    );
> with generator as
> ( select a.*
>     from generate_series (1,3000000) a
>    order by random()
> )
> insert into users ( user_id
>                   , username
>                   )
>        select a
>             , md5(a::varchar)
>          from generator;
> create unique index i_users on users ( user_id );
> create table ids ( id int );
> insert into ids (id) values ( generate_series ( 2500000, 3500000 ) );
> create unique index i_ids on ids ( id );
> analyze users;
> analyze ids;
>
> I have set work_mem to a very low value intentionally for demonstration
> purposes:
>
> postgres=# show work_mem;
>  work_mem
> ----------
>  16MB
> (1 row)
>
> postgres=# show shared_buffers ;
>  shared_buffers
> ----------------
>  128MB
> (1 row)
>
>
> When I run the following query ( I know that "not in" is not a good choice
> here ):
>
> postgres=# select count(user_id) from users where user_id not in ( select id
> from ids);

"NOT IN" where the predate is a table column can lead to very poor
query plans especially where the haystack is not provably known (at
plan time) to contain only not null values.  By reducing work_mem, the
server has decided has to repeatedly search the table to search for
the presence of null values.  Try converting the query to NOT EXISTS.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general