Craig Ringer wrote: > On Tue, 2009-10-27 at 06:08 +0100, Jesper Krogh wrote: > >>> You should probably re-generate your random value for each call rather >>> than store it. Currently, every document with commonterm20 is guaranteed >>> to also have commonterm40, commonterm60, etc, which probably isn't very >>> realistic, and also makes doc size correlate with word rarity. >> I had that in the first version, but I wanted to have the gaurantee that >> a commonterm60 was indeed a subset of commonterm80, so that why its >> sturctured like that. I know its not realistic, but it gives measureable >> results since I know my queries will hit the same tuples. >> >> I fail to see how this should have any direct effect on query time? > > Probably not, in truth, but with the statistics-based planner I'm > occasionally surprised by what can happen. > >>> In this sort of test it's often a good idea to TRUNCATE the table before >>> populating it with a newly generated data set. That helps avoid any >>> residual effects from table bloat etc from lingering between test runs. >> As you could see in the scripts, the table is dropped just before its >> recreated and filled with data. >> >> Did you try to re-run the test? > > No, I didn't. I thought it worth checking if bloat might be the result > first, though I should've read the scripts to confirm you weren't > already handling that possibility. > > Anyway, I've done a run to generate your data set and run a test. After > executing the test statement twice (once with and once without > enable_seqscan) to make sure all data is in cache and not being read > from disk, when I run the tests here are my results: > > > test=> set enable_seqscan=on; > SET > test=> explain analyze select id from ftstest where body_fts @@ > to_tsquery('commonterm80'); Here you should search for "commonterm" not "commonterm80", commonterm will go into a seq-scan. You're not testing the same thing as I did. > Any chance your disk cache was cold on the first test run, so Pg was > having to read the table from disk during the seqscan, and could just > use shared_buffers when you repeated the test for the index scan? they were run repeatedly. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance