Martha Stewart called it a Good Thing when cshobe@xxxxxxxx ("Casey Allen Shobe") wrote: > I posted about this a couple days ago on dspam-dev... > > I am using DSpam with PostgreSQL, and like you discovered the horrible > performance. The reason is because the default PostgreSQL query planner > settings determine that a sequence scan will be more efficient than an > index scan, which is wrong. To correct this behavior, adjust the query > planner settings for the appropriate table/column with this command: > > alter table "dspam_token_data" alter "token" set statistics 200; analyze; > > Let me know if it help you. It worked wonders for me. That makes a great deal of sense; the number of tokens are likely to be rather larger than 10, and are likely to be quite unevenly distributed. That fits with the need you found to collect more statistics on that column. Other cases where it seems plausible that it would be worthwhile to do the same: alter table dspam_signature_data alter signature set statistics 200; alter table dspam_neural_data alter node set statistics 200; alter table dspam_neural_decisions alter signature set statistics 200; Lionel's suggestion of having a functional index on dspam_token_data (innocent_hits + spam_hits) also seems likely to be helpful. Along with that, it might prove necessary to alter stats on dspam_token_data thus: alter table dspam_token_data alter innocent_hits set statistics 200; alter table dspam_token_data alter spam_hits set statistics 200; None of these changes are likely to make things materially worse; if they do help, they'll help rather a lot. -- (format nil "~S@~S" "cbbrowne" "gmail.com") http://www.ntlug.org/~cbbrowne/nonrdbms.html Rules of the Evil Overlord #112. "I will not rely entirely upon "totally reliable" spells that can be neutralized by relatively inconspicuous talismans." <http://www.eviloverlord.com/>