Search Postgresql Archives

Re: Querying a table with jaccard similarity with 1.6 million records take 12 seconds

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Try the pg_tgrm extension. It is a rich set of operators.


Regards,
Ninad Shah

On Thu, 2 Sept 2021 at 23:39, balasubramanian c r <crbs.siebel@xxxxxxxxx> wrote:
HI Ninad

Thanks for your reply.
If bitmap index should not be used. Do i need to disable it for the time being and carry out the test.

The documentation in pg_similarity shows that index can be created on text column using gin with gin_similarity_ops.
The same way the index is created like
CREATE INDEX on address using GIN(complete_address gin_similarity_ops);

AFAIK I have not seen any other operators other than gin in the pg_smilarity extension.

Thanks
C.R.Bala

On Thu, Sep 2, 2021 at 8:17 PM Ninad Shah <nshah.postgres@xxxxxxxxx> wrote:
Hi Bala,

Are your statistics updated? Also, have you used the gin operator(gin_similarity_ops) correctly?

It is fetching just 6 records out of a million, hence, it should not go for bitmap index scan. As bitmap index scan loads a complete index, and access relevant pages from the table later by bitmap heap scan.


Regards,
Ninad Shah


On Thu, 2 Sept 2021 at 16:39, balasubramanian c r <crbs.siebel@xxxxxxxxx> wrote:

Hi Team

We have encountered a problem in our testing environment.
I have a scenario where I am running a similarity match for an address
I have created a table with following number of records
1603423

We are using pg_similarity extension in postgresql version is 13.

And I have created GIN index (since i am using pg_similarity) library jaccard similarity method
when I run the Explain analyze
EXPLAIN ANALYZE select complete_address, jaccard(complete_address, 'raj nagar ext near ajnara integrity up ghaziabad 201017') as qsim from address where complete_address~??'raj nagar ext near ajnara integrity up ghaziabad 201017' order by qsim DESC;
QUERY PLAN

Sort (cost=5856.13..5860.14 rows=1603 width=93) (actual time=12101.194..12101.197 rows=6 loops=1)

Sort Key: (jaccard(complete_address, 'raj nagar ext near ajnara integrity up ghaziabad 201017'::text)) DESC
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on address (cost=172.43..5770.80 rows=1603 width=93) (actual time=3516.233..12101.172 rows=6 loops=1)
Recheck Cond: (complete_address ~?? 'raj nagar ext near ajnara integrity up ghaziabad 201017'::text)
Rows Removed by Index Recheck: 1039186
Heap Blocks: exact=58018
-> Bitmap Index Scan on address_complete_address_idx1 (cost=0.00..172.02 rows=1603 width=0) (actual time=256.037..256.037 rows=1039192 loops=1)
Index Cond: (complete_address ~?? 'raj nagar ext near ajnara integrity up ghaziabad 201017'::text)
Planning Time: 0.141 ms
Execution Time: 12101.245 ms
(11 rows)
it took 12 seconds
following are my pgconf file settings
shared buffer as 4GB
work_mem 256 MB
maintenence_work_mem 512MB
autovacuum_work_mem 20MB
My index definition is this "address_complete_address_idx1" gin (complete_address gin_similarity_ops)

It is taking the index correctly. But why it took 12 seconds to process I really don't understand.

Please help.

Thanks
C.R.Bala


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux