Search Postgresql Archives

Re: Planner: rows=1 after "similar to" where condition.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>-----Original Message-----
>From: Scott Marlowe [mailto:scott.marlowe@xxxxxxxxx] 
>Sent: Monday, 25 February 2008 7:14
>To: Joris Dobbelsteen
>Cc: pgsql-general@xxxxxxxxxxxxxx
>Subject: Re:  Planner: rows=1 after "similar to" 
>where condition.
>
>On Sun, Feb 24, 2008 at 4:35 PM, Joris Dobbelsteen 
><Joris@xxxxxxxxxxxxxxxxxxxxx> wrote:
>> I seem to have some planner oddity, where it seems to completely  
>> mispredict the output after a regex compare. I've seem it on other  
>> occasions, where it completely screws up the join. You can note the  
>> "rows=1" after the filter.
>>  A similar sitution has occurred when doing a regex filter in a 
>> subquery,  which was subsequently predited as 1 row and triggered 
>> (oddly enough) a  sequencial scan. Doing the same using 
>"equality" on 
>> the result to  substring(<text> from <regex>) seemed to work and 
>> produced a useful  plan, since it did a hash-join (as it 
>should have).
>>  Is this a known problem? Otherwise I think I should build a smaller 
>> test  case...
>>
>>  Using Postgresql 8.2.6 from Debian Etch-backports.

Should be:
PostGreSQL 8.2.5 on x86_64-pc-linux-gnu (GCC 4.1.2.20061115) (Debian
4.1.1-21).
Should have paid closer attention.

>>
>>  "Bitmap Heap Scan on log_syslog syslog  (cost=13124.26..51855.25 
>> rows=1  width=270)"
>>  "  Recheck Cond: (((program)::text = 'amavis'::text) AND  
>> ((facility)::text = 'mail'::text))"
>>  "  Filter: ***SOME VERY LONG SIMILAR TO REGEX****"
>>  "  ->  BitmapAnd  (cost=13124.26..13124.26 rows=18957 width=0)"
>>  "        ->  Bitmap Index Scan on "IX_log_syslog_program"
>>  (cost=0.00..2223.95 rows=92323 width=0)"
>>  "              Index Cond: ((program)::text = 'amavis'::text)"
>>  "        ->  Bitmap Index Scan on "IX_log_syslog_facility"
>>  (cost=0.00..10899.81 rows=463621 width=0)"
>>  "              Index Cond: ((facility)::text = 'mail'::text)"
>
>It's not saying it will only get one row back for sure, it's 
>saying it thinks it will return one row.  and depending on 
>your query, it might.
> What's the query, and what's the explain analyze of that query?
>

See the attached file for the query and the explain (hopefully this
gives a consistent view and maintains the layout for easier reading).

The point is that it will NOT, not even close. The planner guesses 1
row, but the output was arround 13000 rows (of the 2.2M rows in the
table). Oddly enough the 18k rows on the bitmap and seems a very good
estimation. In fact, if I omit the "SIMILAR TO", it estimates ~12000
rows, which is spot on. So it seems the SIMILAR TO really gets the
planner confused.

The real value was 12981 rows that were returned by the query in the
first case. However, since I removed this data from the original table
(its now somewhere else), I cannot present the original EXPLAIN ANALYZE
any more. The new dataset only contains ~137 (but I still have the old
statistics, I think, or at least they provide the same predictions).
I also included a run after EXPLAIN ANALYZE on the current dataset.

Hopefully this helps.

Thanks,

- Joris

========QUERY=========



SELECT	a.msgid, a.rcv_time, a.sent_time, a.hostname, 
	a.passedas, a.from, a.to,
	a.message_id, a.resent_message_id,
	a.mail_id,
	a.queue_id,
	convert_score(a.spamscore) AS spamscore,
	CAST(a.duration AS integer)
FROM	(SELECT	syslog.msgid, syslog.rcv_time, syslog.sent_time, syslog.hostname,
		substring(syslog.text from 'amavis% Passed #"%#", % <%> -> <%>, Message-ID: <%>, mail_id: %, Hits: %, queued_as: %, % ms' for '#') as passedas,
		substring(syslog.text from 'amavis% Passed %, % <#"%#"> -> <%>, Message-ID: <%>, mail_id: %, Hits: %, queued_as: %, % ms' for '#') as from,
		substring(syslog.text from 'amavis% Passed %, % <%> -> <#"%#">, Message-ID: <%>, mail_id: %, Hits: %, queued_as: %, % ms' for '#') as to,
		substring(syslog.text from 'amavis% Passed %, % <%> -> <%>, Message-ID: <#"[^<>]+#">, (Resent-Message-ID: <#"%#">, |)mail_id: %, Hits: %, queued_as: %, % ms' for '#') as message_id,
		substring(syslog.text from 'amavis% Passed %, % <%> -> <%>, Message-ID: <%>, Resent-Message-ID: <#"%#">, mail_id: %, Hits: %, queued_as: %, % ms' for '#') as resent_message_id,
		substring(syslog.text from 'amavis% Passed %, % <%> -> <%>, Message-ID: <%>, mail_id: #"%#", Hits: %, queued_as: %, % ms' for '#') as mail_id,
		substring(syslog.text from 'amavis% Passed %, % <%> -> <%>, Message-ID: <%>, mail_id: %, Hits: #"%#", queued_as: %, % ms' for '#') as spamscore,
		substring(syslog.text from 'amavis% Passed %, % <%> -> <%>, Message-ID: <%>, mail_id: %, Hits: %, queued_as: #"%#", % ms' for '#') as queue_id,
		substring(syslog.text from 'amavis% Passed %, % <%> -> <%>, Message-ID: <%>, mail_id: %, Hits: %, queued_as: %, #"%#" ms' for '#') as duration,
		syslog.text,
		syslog.facility, syslog.priority, syslog.program
	 FROM public.log_syslog syslog
	 WHERE syslog.program = 'amavis'
	 AND syslog.facility = 'mail'
	 AND syslog.priority = 'notice'
	 AND syslog.text SIMILAR TO 'amavis% Passed [A-Za-z0-9]+, [][0-9.]* <[^<>]+> -> <[^<>]+>, Message-ID: <[^<>]+>, (Resent-Message-ID: <[^<>]+>, |)mail_id: [^ ,]+, Hits: [-+0-9.,]+, queued_as: [^ ,]+, [0-9]+ ms'
	) a
;




========EXPLAIN=========

"Bitmap Heap Scan on log_syslog syslog  (cost=13196.27..51927.22 rows=1 width=270)"
"  Recheck Cond: (((program)::text = 'amavis'::text) AND ((facility)::text = 'mail'::text))"
"  Filter: (((priority)::text = 'notice'::text) AND ((text)::text ~ '***:^(?:amavis.* Passed [A-Za-z0-9]+, [][0-9\\.]* <[^<>]+> -> <[^<>]+>, Message-ID: <[^<>]+>, (Resent-Message-ID: <[^<>]+>, |)mail.id: [^ ,]+, Hits: [-+0-9\\.,]+, queued.as: [^ ,]+, [0-9]+ ms)$'::text))"
"  ->  BitmapAnd  (cost=13196.27..13196.27 rows=18957 width=0)"
"        ->  Bitmap Index Scan on "IX_log_syslog_program"  (cost=0.00..2235.95 rows=92323 width=0)"
"              Index Cond: ((program)::text = 'amavis'::text)"
"        ->  Bitmap Index Scan on "IX_log_syslog_facility"  (cost=0.00..10959.81 rows=463621 width=0)"
"              Index Cond: ((facility)::text = 'mail'::text)"

========EXPLAIN ANALYZE=========

"Bitmap Heap Scan on log_syslog syslog  (cost=13199.27..51930.22 rows=1 width=270) (actual time=8958.762..47095.587 rows=138 loops=1)"
"  Recheck Cond: (((program)::text = 'amavis'::text) AND ((facility)::text = 'mail'::text))"
"  Filter: (((priority)::text = 'notice'::text) AND ((text)::text ~ '***:^(?:amavis.* Passed [A-Za-z0-9]+, [][0-9\\.]* <[^<>]+> -> <[^<>]+>, Message-ID: <[^<>]+>, (Resent-Message-ID: <[^<>]+>, |)mail.id: [^ ,]+, Hits: [-+0-9\\.,]+, queued.as: [^ ,]+, [0-9]+ ms)$'::text))"
"  ->  BitmapAnd  (cost=13199.27..13199.27 rows=18957 width=0) (actual time=411.696..411.696 rows=0 loops=1)"
"        ->  Bitmap Index Scan on "IX_log_syslog_program"  (cost=0.00..2238.95 rows=92323 width=0) (actual time=111.073..111.073 rows=97779 loops=1)"
"              Index Cond: ((program)::text = 'amavis'::text)"
"        ->  Bitmap Index Scan on "IX_log_syslog_facility"  (cost=0.00..10959.81 rows=463621 width=0) (actual time=291.897..291.897 rows=466392 loops=1)"
"              Index Cond: ((facility)::text = 'mail'::text)"
"Total runtime: 47096.687 ms"



==================================
========= WITHOUT REGEX ==========
Just comment out the SIMILAR TO line.

========EXPLAIN=========

"Bitmap Heap Scan on log_syslog syslog  (cost=13208.13..52496.93 rows=11867 width=270)"
"  Recheck Cond: (((program)::text = 'amavis'::text) AND ((facility)::text = 'mail'::text))"
"  Filter: ((priority)::text = 'notice'::text)"
"  ->  BitmapAnd  (cost=13208.13..13208.13 rows=18957 width=0)"
"        ->  Bitmap Index Scan on "IX_log_syslog_program"  (cost=0.00..2235.95 rows=92323 width=0)"
"              Index Cond: ((program)::text = 'amavis'::text)"
"        ->  Bitmap Index Scan on "IX_log_syslog_facility"  (cost=0.00..10959.81 rows=463621 width=0)"
"              Index Cond: ((facility)::text = 'mail'::text)"

========EXPLAIN ANALYZE=========

"Bitmap Heap Scan on log_syslog syslog  (cost=13214.13..52502.93 rows=11867 width=270) (actual time=731.061..55188.298 rows=10045 loops=1)"
"  Recheck Cond: (((program)::text = 'amavis'::text) AND ((facility)::text = 'mail'::text))"
"  Filter: ((priority)::text = 'notice'::text)"
"  ->  BitmapAnd  (cost=13214.13..13214.13 rows=18957 width=0) (actual time=672.800..672.800 rows=0 loops=1)"
"        ->  Bitmap Index Scan on "IX_log_syslog_program"  (cost=0.00..2238.95 rows=92323 width=0) (actual time=103.780..103.780 rows=97784 loops=1)"
"              Index Cond: ((program)::text = 'amavis'::text)"
"        ->  Bitmap Index Scan on "IX_log_syslog_facility"  (cost=0.00..10962.81 rows=463621 width=0) (actual time=560.112..560.112 rows=466409 loops=1)"
"              Index Cond: ((facility)::text = 'mail'::text)"
"Total runtime: 55201.639 ms"


=========================================
========= AFTER VACUUM ANALYZE ==========

========EXPLAIN ANALYZE=========

"Bitmap Heap Scan on log_syslog syslog  (cost=12773.57..47836.81 rows=1 width=267) (actual time=12377.693..33147.458 rows=142 loops=1)"
"  Recheck Cond: (((program)::text = 'amavis'::text) AND ((facility)::text = 'mail'::text))"
"  Filter: (((priority)::text = 'notice'::text) AND ((text)::text ~ '***:^(?:amavis.* Passed [A-Za-z0-9]+, [][0-9\\.]* <[^<>]+> -> <[^<>]+>, Message-ID: <[^<>]+>, (Resent-Message-ID: <[^<>]+>, |)mail.id: [^ ,]+, Hits: [-+0-9\\.,]+, queued.as: [^ ,]+, [0-9]+ ms)$'::text))"
"  ->  BitmapAnd  (cost=12773.57..12773.57 rows=16636 width=0) (actual time=666.590..666.590 rows=0 loops=1)"
"        ->  Bitmap Index Scan on "IX_log_syslog_program"  (cost=0.00..1998.97 rows=82632 width=0) (actual time=191.218..191.218 rows=84828 loops=1)"
"              Index Cond: ((program)::text = 'amavis'::text)"
"        ->  Bitmap Index Scan on "IX_log_syslog_facility"  (cost=0.00..10774.10 rows=456489 width=0) (actual time=466.576..466.576 rows=453525 loops=1)"
"              Index Cond: ((facility)::text = 'mail'::text)"
"Total runtime: 33155.708 ms"

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org/

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux