Re: Performace Optimization for Dummies

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> So your program first transforms the data and then inserts it?  And it is
> the transforming process which is running select statements that is slow?

There are cross-referencing and deduplication processes. Does this person 
have an office at this exact address? In a similarily named building in the 
same zip code? City? What is the similarity of the building or enterprise 
names? Is there a person with a similar name with the same type of 
professional license nearby? We basically look for the statistical 
likelyhood that they already exist to decide whether to update their data, 
or insert a new data element.

These are all extremely soft queries and require left outer joins with all 
of the related tables that would contain this data (the left outer join 
tells us whether the related element satisfied the search condition). As I 
mentioned, as the data comes in, we examine what we have to work with and 
modify the tables and columns we can check - which is what I meant by " the 
SQL queries are mutable, every imported row can change the structure of a 
SQL query as the program adds and subtracts search conditions to the SQL 
command text before execution."

> If that is the case you could use duration logging to find the slow select
> statement, and then you could post an EXPLAIN ANALYZE of the select.

I'm pretty sure I know who the culprit is, and - like I said, it comes from 
a section of code that creates a mutable statement. However, everyone is 
being so helpful and I should post this data as soon as I can.

> One question off the top of my head is are you using regular expressions 
> for
> your fuzzy logic if so do your indexes have the right operator classes?
> (see http://www.postgresql.org/docs/8.1/static/indexes-opclass.html)

I am using regular expressions and fuzzy logic, but mostly on the client 
side (I have a Tcl implementation of levenshtein, for example). I don't 
think you can use indexes on functions such as levenshtein, because it 
requires a parameter only available at execution time. The link you sent me 
was very interesting - I will definitely reconsider my query logic if I can 
optimise regular expression searches on the server. Thanks!

Carlo 




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux