Search Postgresql Archives

Re: Index is not used for "IN (non-correlated subquery)"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 30, 2016 at 11:05 AM, George <pinkisntwell@xxxxxxxxx> wrote:
> On Wed, Nov 30, 2016 at 6:45 PM, Merlin Moncure <mmoncure@xxxxxxxxx> wrote:
>> On Wed, Nov 30, 2016 at 10:42 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
>>> George <pinkisntwell@xxxxxxxxx> writes:
>>>> explain select * from wg3ppbm_transaction where partner_uuid in (
>>>>             select p.uuid
>>>>             from wg3ppbm_userpartner up
>>>>             join wg3ppbm_partner p on p.id = up.partner_id
>>>> );
>>>
>>>> "Hash Semi Join  (cost=2.07..425.72 rows=2960 width=482)"
>>>> "  Hash Cond: ((wg3ppbm_transaction.partner_uuid)::text = (p.uuid)::text)"
>>>> "  ->  Seq Scan on wg3ppbm_transaction  (cost=0.00..375.19 rows=5919 width=482)"
>>>> "  ->  Hash  (cost=2.06..2.06 rows=1 width=37)"
>>>> "        ->  Nested Loop  (cost=0.00..2.06 rows=1 width=37)"
>>>> "              Join Filter: (up.partner_id = p.id)"
>>>> "              ->  Seq Scan on wg3ppbm_userpartner up
>>>> (cost=0.00..1.01 rows=1 width=4)"
>>>> "              ->  Seq Scan on wg3ppbm_partner p  (cost=0.00..1.02
>>>> rows=2 width=41)"
>>>
>>> This plan is expecting to have to return about half of the rows in
>>> wg3ppbm_transaction, a situation for which an indexscan would NOT
>>> be a better choice.  The usual rule of thumb is that you need to be
>>> retrieving at most one or two percent of a table's rows for an indexscan
>>> on it to be faster than a seqscan.
>>>
>>> I think however that the "half" may be a default estimate occasioned
>>> by the other tables being empty and therefore not having any statistics.
>>> Another rule of thumb is that the plans you get for tiny tables have
>>> little to do with what happens once there's lots of data.
>>
>> Yeah, don't make query plan assumptions against empty or nearly empty
>> tables.  As the data grows, the plans will suitably change.  Perhaps
>> OP just recently loaded a bunch of data and the tables haven't been
>> analyzed yet?
>
> I just added a significant number of rows to the table. I now have
> 1.3M rows in total but only 8K rows that contain the value I am
> seeking. I also ran ANALYZE after loading the data. The query plans
> for the two queries did not change. Also, the simple query returns in
> 45 ms while the one with the subquery needs 1.5 s, i.e. it is about
> 30x slower.
>
> So there is definitely something wrong here. This situation makes many
> row-level security use cases cumbersome since you need to have
> almost the same WHERE clause both in the row-level security policy and
> in every SELECT query in order for the index to be used.

can you give EXPLAIN ANALYZE for the 'good' query and the 'bad' query?

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux