Thanks for the answers. > Den 30/06/2014 kl. 20.04 skrev Jeff Janes <jeff.janes@xxxxxxxxx>: > > On Wed, Jun 25, 2014 at 1:48 AM, Niels Kristian Schjødt > <nielskristian@xxxxxxxxxxxxx> wrote: >> Hi, >> I’m running a search engine for cars. It’s backed by a postgresql 9.3 installation. >> >> Now I’m unsure about the best approach/strategy on doing index optimization for the fronted search. >> >> The problem: >> >> The table containing the cars holds a around 1,5 million rows. People that searches for cars needs different criteria to search by. Some search by brand/model, some by year, some by mileage, some by price and some by special equipment etc. etc. - and often they combine a whole bunch of criteria together. Of cause some, like brand/mode and price, are used more frequently than others. In total we offer: 9 category criteria like brand/model or body type, plus 5 numeric criteria like price or mileage, plus 12 boolean criteria like equipment. Lastly people can order the results by different columns (year, price, mileage and a score we create about the cars). By default we order by our own generated score. >> >> What I’ve done so far: >> >> I have analyzed the usage of the criteria “lightly”, and created a few indexes (10). Among those, are e.g. indexes on price, mileage and a combined index on brand/model. Since we are only interested in showing results for cars which is actually for sale, the indexes are made as partial indexes on a sales state column. > > I'd probably partition the data on whether it is for sale, and then > search only the for-sale partition. Hmm okay, I already did all the indexes partial based on the for-sales state. If the queries always queries for-sale, and all indexes are partial based on those, will it then still help performance / make sense to partition the tables? > >> >> Questions: >> >> 1. How would you go about analyzing and determining what columns should be indexed, and how? > > I'd start out with intuition about which columns are likely to be used > most often, and in a selective way. And followup by logging slow > queries so they can be dissected at leisure. > >> 2. What is the best strategy when optimizing indexes for searches happening on 20 + columns, where the use and the combinations varies a lot? (To just index everything, to index some of the columns, to do combined indexes, to only do single column indexes etc. etc.) > > There is no magic index. Based on your description, you are going to > be seq scanning your table a lot. Focus on making it as small as > possible, but vertical partitioning it so that the not-for-sale > entries are hived off to an historical table, and horizontally > partitioning it so that large columns rarely used in the where clause > are in a separate table (Ideally you would tell postgresql to > aggressively toast those columns, but there is no knob with which to > do that) > > >> 3. I expect that it does not make sense to index all columns? > > You mean individually, or jointly? Either way, probably not. > >> 4. I expect it does not make sense to index boolean columns? > > In some cases it can, for example if the data distribution is very > lopsided and the value with the smaller side is frequently specified. > >> 5. Is it better to do a combined index on 5 frequently used columns rather than having individual indexes on each of them? > > How often are the columns specified together? If they are completely > independent it probably makes little sense to index them together. > >> 6. Would it be a goof idea to have all indexes sorted by my default sorting? > > You don't get to choose. An btree index is sorted by the columns > specified in the index, according to the operators specified (or > defaulted). Unless you mean that you want to add the default sort > column to be the lead column in each index, that actually might make > sense. > >> 7. Do you have so experiences with other approaches that could greatly improve performance (e.g. forcing indexes to stay in memory etc.)? > > If your queries are as unstructured as you imply, I'd forget about > indexes for the most part, as you will have a hard time findings ones > that work. Concentrate on making seq scans as fast as possible. If > most of your queries end in something like "ORDER by price limit 10" > then concentrate on index scans over price. You will probably want to > include heuristics in your UI such that if people configure queries to > download half your database, you disallow that. You will probably > find that 90% of the workload comes from people who are just playing > around with your website and don't actually intend to do business with > you. > > Cheers, > > Jeff