Re: Help with Query Tuning

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks, I understand it know :-

But My one doubt which isn't clear  :

Original Query :-

select  count(*)  from page_content where (content like '%Militant%'
OR content like '%jihad%' OR  content like '%Mujahid%'  OR
 content like '%fedayeen%' OR content like '%insurgent%'  OR content like '%terrORist%' OR
  content like '%cadre%'  OR content like '%civilians%' OR content like '%police%' OR content like '%defence%' OR content like '%cops%' OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content like '%kill%' OR content like '%injure%');

Output :-
 count
-------
 57061
(1 row)

Time: 19726.555 ms

I need to tune it , use full-text searching as :

Modified Query :-

SELECT count(*)  from page_content
WHERE publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and to_tsvectOR('english',content) @@ to_tsquery('english','Mujahid' || 'jihad' || 'Militant' || 'fedayeen' || 'insurgent' || 'terrORist' || 'cadre' || 'civilians' || 'police' || 'defence' || 'cops' || 'crpf' || 'dsf' || 'ssb');

Output :-
 count
-------
     0
(1 row)

Time: 194685.125 ms

I try, SELECT count(*)  from page_content
WHERE publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and to_tsvectOR('english',content) @@ to_tsquery('english','%Mujahid%' || '%jihad%' || '%Militant%' || '%fedayeen%' || '%insurgent%' || '%terrORist%' || '%cadre%' || '%civilians%' || '%police%' || '%defence%' || '%cops%' || '%crpf%' || '%dsf%' || '%ssb%');

 count
-------
     0
(1 row)

Time: 194722.468 ms

I know I have to create index but index is the next step, first you have to get the correct result .

CREATE INDEX pgweb_idx ON page_content USING gin(to_tsvector('english', content));


Please guide me where I am going wrong.


Thanks & best Regards,

Adarsh Sharma
Kenneth Marshall wrote:
On Wed, Mar 16, 2011 at 02:43:38PM +0530, Adarsh Sharma wrote:
  
Dear all,

I am facing a problem while  creating the index to make the below query run 
faster. My table  size is near about 1065 MB and 428467 rows.

explain analyze select  count(*)  from page_content where publishing_date 
like '%2010%' and content_language='en'  and content is not null and 
isprocessable = 1 and (content like '%Militant%'
OR content like '%jihad%' OR  content like '%Mujahid%'  OR
content like '%fedayeen%' OR content like '%insurgent%'  OR content like 
'%terrorist%' OR
 content like '%cadre%'  OR content like '%civilians%' OR content like 
'%police%' OR content like '%defence%' OR content like '%cops%' OR content 
like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content 
like '%kill%' or content like '%injure%');

*Output:

* Aggregate  (cost=107557.78..107557.79 rows=1 width=0) (actual 
time=18564.631..18564.631 rows=1 loops=1)
  ->  Seq Scan on page_content  (cost=0.00..107466.82 rows=36381 width=0) 
(actual time=0.146..18529.371 rows=59918 loops=1)
        Filter: ((content IS NOT NULL) AND (publishing_date ~~ 
'%2010%'::text) AND (content_language = 'en'::bpchar) AND (isprocessable = 
1) AND (((content)
::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND 
(((content)::text ~~ '%Militant%'::text) OR ((content)::text ~~ 
'%jihad%'::text) OR (
(content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~ 
'%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text) OR 
((content)::text ~~ '%terrori
st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text ~~ 
'%civilians%'::text) OR ((content)::text ~~ '%police%'::text) OR 
((content)::text
~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR 
((content)::text ~~ '%crpf%'::text) OR ((content)::text ~~ '%dsf%'::text) 
OR ((content)::text
~~ '%ssb%'::text)))
Total runtime: 18564.673 ms


*Index on that Table :

*CREATE INDEX idx_page_id
 ON page_content
 USING btree
 (crawled_page_id);

*Index I create :*
CREATE INDEX idx_page_id_content
 ON page_content
 USING btree
 (crawled_page_id,content_language,publishing_date,isprocessable);

*Index that fail to create:

*CREATE INDEX idx_page_id_content1
 ON page_content
 USING btree
 (crawled_page_id,content);

Error :-ERROR:  index row requires 13240 bytes, maximum size is 8191
********** Error **********

ERROR: index row requires 13240 bytes, maximum size is 8191
SQL state: 54000

How to resolve this error
Please give any suggestion to tune the query.

Thanks & best Regards,

Adarsh Sharma

    

You should probably be looking at using full-text indexing:

http://www.postgresql.org/docs/9.0/static/textsearch.html

or limit the size of content for the index.

Cheers,
Ken
  


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux