On Sat, Jul 17, 2010 at 3:14 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > Howard Rogers <hjr@xxxxxxxxxx> writes: >> OK, Tom: I did actually account for the number of rows difference >> before I posted, though I accept I didn't show you that. So here goes: >> ... >> Both queries return zero rows. One takes an awful lot longer than the >> other. The only difference between them is that one searches for >> 'ftx1' and the other searches for 'ftx0'. > > Well, this still doesn't tell us anything about what I think the > critical point is, namely how many actual matches there are for > ftx1 versus ftx0. Could we see counts for *just* those words without > the other conditions? > >> So, I would still like to know if this performance difference when >> encountering alpha-numeric "words" is dictionary-related, > > AFAIK there is no significant difference between treatment of pure alpha > and mixed alphanumeric "words", at least not once you get past > to_tsquery. I'm still expecting this is just a matter of how many index > entries match. It's barely possible that you've got a dictionary > configuration that makes the to_tsquery() function itself a lot slower > in the alphanumeric case, but that should affect ftx1 and ftx0 equally. > > regards, tom lane > > -- > Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > My apologies for making this harder than it should have been. My queries were correct, my description of them wasn't. The only difference was 'ftx1' and 'ftxa', not 'ftx0'. Anyway, I think I've finally got the point being made by you and Richard (and sorry for being so slow about it!): ftx1 is a very, very common flag. I'd guess 99% of records would be tagged with it. And ftxa doesn't exist at all, of course. So, you're right: ims=# select count(*) from search_rm where to_tsvector('english', textsearch) @@ to_tsquery('english','ball & beach & salsm1'); count ------- 1753 (1 row) Time: 557.010 ms ims=# select count(*) from search_rm where to_tsvector('english', textsearch) @@ to_tsquery('english','ball & beach & salsm4'); count ------- 97 (1 row) Time: 39.518 ms Salsm1 is another very common flag, applied to about 80% of documents. Salsm4 applies to about 160 documents in total. So it's really not the alpha-numerics causing the issue after all. It's genuinely the number of records matching each term, as you originally said. (And a note to Steve: it makes no difference separating out the 'flag factor', I'm afraid). It's given me lots to think about. The hard part ought to be finding the women, or the beaches; the flags should be mere filters applied after those have been fetched. Clearly, though, that's not how this is behaving, though I've been fooled into thinking it ought to be because of the apparent default optimization done in Oracle Text. Some new technique is called for, I guess! (If you've got any ideas, I'm all ears...) Anyway: at least you've all helped me realise that it's not a dictionary problem (or a word-type) problem, so thank you all very much for that; much appreciated. Regards HJR -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general