Search Postgresql Archives

Re: Comparison of Oracle and PostgreSQL full text search

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 29, 2010 at 10:33 PM, Dean Rasheed <dean.a.rasheed@xxxxxxxxx> wrote:
> On 28 July 2010 02:58, Howard Rogers <hjr@xxxxxxxxxx> wrote:
>> For what it's worth, I wrote up the performance comparison here:
>> http://diznix.com/dizwell/archives/153
>>
>
> Thanks, very interesting results. I wonder, are the results being
> sorted by the database? The performance degradation for large numbers
> of results might be explained by it switching over from an internal to
> an external sort, in which case tweaking work_mem might make a
> difference.
>
> Of course this is pure speculation without the EXPLAIN ANALYSE output.
>
> Regards,
> Dean


Yes, the results were being sorted. I did various tests, changing
work_mem, shared_buffers and much else, one by one, until I arrived at
the combination of settings that gave me the best 'total search time'
results. Personally, I couldn't see any difference in the explain
plans, but I was in a bit of a hurry and I may have missed it.

For the search term 'woman', which matches 1,590,275 documents, here's
the explain plan:

"Sort  (cost=185372.88..185372.93 rows=20 width=312) (actual
time=10537.152..10537.154 rows=20 loops=1)"
"  Sort Key: a.rf, a.sort_id"
"  Sort Method:  quicksort  Memory: 48kB"
"  ->  Result  (cost=109119.55..185372.45 rows=20 width=312) (actual
time=4309.020..10537.116 rows=20 loops=1)"
"        ->  Append  (cost=109119.55..185372.45 rows=20 width=312)
(actual time=4309.018..10537.108 rows=20 loops=1)"
"              ->  Subquery Scan a  (cost=109119.55..109119.68 rows=10
width=312) (actual time=4309.018..4309.026 rows=10 loops=1)"
"                    ->  Limit  (cost=109119.55..109119.58 rows=10
width=641) (actual time=4309.016..4309.019 rows=10 loops=1)"
"                          ->  Sort  (cost=109119.55..109121.94
rows=957 width=641) (actual time=4309.014..4309.015 rows=10 loops=1)"
"                                Sort Key: search_rm.sort_id"
"                                Sort Method:  top-N heapsort  Memory: 35kB"
"                                ->  Bitmap Heap Scan on search_rm
(cost=6651.07..109098.87 rows=957 width=641) (actual
time=272.851..4021.458 rows=583275 loops=1)"
"                                      Recheck Cond:
(to_tsvector('english'::regconfig, (textsearch)::text) @@
'''woman'''::tsquery)"
"                                      Filter:
((bitand(sales_method_code, 1) > 0) AND (bitand(subsiter, 1) > 0) AND
(bitand(filetype, 1) > 0))"
"                                      ->  Bitmap Index Scan on
rmsearch_idx  (cost=0.00..6650.83 rows=25826 width=0) (actual
time=165.711..165.711 rows=586235 loops=1)"
"                                            Index Cond:
(to_tsvector('english'::regconfig, (textsearch)::text) @@
'''woman'''::tsquery)"
"              ->  Subquery Scan b  (cost=76252.65..76252.77 rows=10
width=312) (actual time=6228.073..6228.080 rows=10 loops=1)"
"                    ->  Limit  (cost=76252.65..76252.67 rows=10
width=727) (actual time=6228.072..6228.075 rows=10 loops=1)"
"                          ->  Sort  (cost=76252.65..76254.29 rows=655
width=727) (actual time=6228.071..6228.072 rows=10 loops=1)"
"                                Sort Key: search_rf.sort_id"
"                                Sort Method:  top-N heapsort  Memory: 38kB"
"                                ->  Bitmap Heap Scan on search_rf
(cost=5175.18..76238.49 rows=655 width=727) (actual
time=363.684..5748.279 rows=1007000 loops=1)"
"                                      Recheck Cond:
(to_tsvector('english'::regconfig, (textsearch)::text) @@
'''woman'''::tsquery)"
"                                      Filter:
((bitand(sales_method_code, 1) > 0) AND (bitand(subsiter, 1) > 0) AND
(bitand(filetype, 1) > 0))"
"                                      ->  Bitmap Index Scan on
rfsearch_idx  (cost=0.00..5175.02 rows=17694 width=0) (actual
time=242.859..242.859 rows=1030282 loops=1)"
"                                            Index Cond:
(to_tsvector('english'::regconfig, (textsearch)::text) @@
'''woman'''::tsquery)"
"Total runtime: 10538.832 ms"


And here's the plan for the search term "clover",  which matches only
2,808 records in total:

"  ->  Result  (cost=109119.55..185372.45 rows=20 width=312) (actual
time=16.807..23.990 rows=20 loops=1)"
"        ->  Append  (cost=109119.55..185372.45 rows=20 width=312)
(actual time=16.806..23.985 rows=20 loops=1)"
"              ->  Subquery Scan a  (cost=109119.55..109119.68 rows=10
width=312) (actual time=16.806..16.812 rows=10 loops=1)"
"                    ->  Limit  (cost=109119.55..109119.58 rows=10
width=641) (actual time=16.805..16.807 rows=10 loops=1)"
"                          ->  Sort  (cost=109119.55..109121.94
rows=957 width=641) (actual time=16.804..16.805 rows=10 loops=1)"
"                                Sort Key: search_rm.sort_id"
"                                Sort Method:  top-N heapsort  Memory: 35kB"
"                                ->  Bitmap Heap Scan on search_rm
(cost=6651.07..109098.87 rows=957 width=641) (actual
time=1.054..15.577 rows=1807 loops=1)"
"                                      Recheck Cond:
(to_tsvector('english'::regconfig, (textsearch)::text) @@
'''clover'''::tsquery)"
"                                      Filter:
((bitand(sales_method_code, 1) > 0) AND (bitand(subsiter, 1) > 0) AND
(bitand(filetype, 1) > 0))"
"                                      ->  Bitmap Index Scan on
rmsearch_idx  (cost=0.00..6650.83 rows=25826 width=0) (actual
time=0.615..0.615 rows=1807 loops=1)"
"                                            Index Cond:
(to_tsvector('english'::regconfig, (textsearch)::text) @@
'''clover'''::tsquery)"
"              ->  Subquery Scan b  (cost=76252.65..76252.77 rows=10
width=312) (actual time=7.161..7.166 rows=10 loops=1)"
"                    ->  Limit  (cost=76252.65..76252.67 rows=10
width=727) (actual time=7.161..7.163 rows=10 loops=1)"
"                          ->  Sort  (cost=76252.65..76254.29 rows=655
width=727) (actual time=7.160..7.161 rows=10 loops=1)"
"                                Sort Key: search_rf.sort_id"
"                                Sort Method:  top-N heapsort  Memory: 35kB"
"                                ->  Bitmap Heap Scan on search_rf
(cost=5175.18..76238.49 rows=655 width=727) (actual time=0.433..6.642
rows=1001 loops=1)"
"                                      Recheck Cond:
(to_tsvector('english'::regconfig, (textsearch)::text) @@
'''clover'''::tsquery)"
"                                      Filter:
((bitand(sales_method_code, 1) > 0) AND (bitand(subsiter, 1) > 0) AND
(bitand(filetype, 1) > 0))"
"                                      ->  Bitmap Index Scan on
rfsearch_idx  (cost=0.00..5175.02 rows=17694 width=0) (actual
time=0.250..0.250 rows=1045 loops=1)"
"                                            Index Cond:
(to_tsvector('english'::regconfig, (textsearch)::text) @@
'''clover'''::tsquery)"
"Total runtime: 24.143 ms"


I can't see any change to the sorting behaviour there. Work_mem was
set to 4096MB, shared buffers to 12228MB, temp_buffers to 1024MB,
effective_cache_size to 18442MB.

Sadly, I won't be able to provide much further analysis or
information, because the box concerned is being wiped. The MD decided
that, as a matter of corporate governance, he couldn't punt the
company on PostgreSQL, so my experimenting days are over. Back to
Oracle: slower, but with a support contract he can sue on, I guess!

Regards
HJR

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux