Em 06-11-2012 16:42, Merlin Moncure escreveu:
On Tue, Nov 6, 2012 at 12:09 PM, Rodrigo Rosenfeld Rosas <rr.rosas@xxxxxxxxx> wrote:http://explain.depesz.com/s/ToX (fast on 9.1) http://explain.depesz.com/s/65t (fast on 9.2) http://explain.depesz.com/s/gZm (slow on 9.1) http://explain.depesz.com/s/END (slow on 9.2 - funny that the generated URL was END while this was my last explain :D )Hm -- looking at your 'slow' 9.2 query, it is reporting that the query took 3 seconds (reported times are in milliseconds). How are you timing the data? What happens when you run explain analyze <your_query> from psql (as in, how long does it take)? The time I reported in the tables of my first message were the time reported by pgAdmin3 (compiled from source). But I get similar time when I run like this: time psql -p 5432 -f slow.sql db_name > slow-9.2-again.explain real 1m56.353s user 0m0.068s sys 0m0.020s slow-9.2-again.explain: http://explain.depesz.com/s/zF1 Let me explain how the application works, how the database was designed and hopefully you'll be able to guide me in the correct way to design the database for this use case. Our application will present a big contract to some attorneys. There is currently a dynamic template with around 800 fields to be extracted from each contract in our system. These fields can be of different types (boolean, string, number, currency, percents, fixed options, dates, time-spans and so on). There is a fields tree that is maintained by the application editors. The application will allow the attorneys to read the contracts and highlight parts of the contract where they extracted each field from and associate each field with its value interpreted by the attorney and store the reference to what paragraphs in the contract demonstrate where the value came from. Then there is an interface that will allow clients to search for transactions based on its associated contracts and those ~800 fields. For the particular query above, 14 of the 800 fields have been searched by this particular user (most of them were boolean ones plus a few options and a string field). Usually the queries perform much better when less than 10 fields are used in the criteria. But our client wants us to handle up to 20 fields in a single query or they won't close the deal and this is a really important client to us. So, for the time being my only plan is to rollback to PG 9.1 and replace my query builder that currently generate queries like slow.sql and change it to generate the queries like fast.sql but I'm pretty sure this approach should be avoided. I just don't know any other alternative for the time being. What database design would you recommend me for this use case?I would strongly consider investigation of hstore type along with gist/gin index. select * from company_transaction where contract_attributes @> 'State=>Delaware, Paid=Y'; etc I'm not very familiar with hstore yet but this was one of the reasons I wanted to migrate to PG 9.2 but I won't be able to migrate the application quickly to use hstore. Also, I'm not sure if hstore allows us to be as flexible as we currently are (c1 and (c2 or c3 and not (c4 and c5))). c == condition Barring that, I would then consider complete elimination of integer proxies for your variables. They make your query virtually impossible to read/write, and they don't help. I'm not sure if I understood what you're talking about. The template is dynamic and contains lots of information for each field, like type (number, percent, string, date, etc), parent_id (auto-referencing), aggregator_id (also auto-referencing) and several other columns. But the values associate the field id (type_id) and the transaction id in a unique way (see unique index in my first message of the thread). Then I need different tables to store the actual value because we're using SQL instead of MongoDB or something else. The table that stores the value depend on the field type. Maybe it would help me to understand if you could provide some example for the design you're proposing. Thank you very much, Rodrigo. |