Aleksander Kmetec <aleksander.kmetec@xxxxxxxxx> writes: > I'm looking for a solution for indexing long TEXT columns. We're currently using a HASH index, which can handle most > situations, but every now and then we need support for even longer texts. > One solution would be to create a functional index which would only use the first N chars of mycol, but then we'd have > to change several hundred occurences of "mycol = someval" with "(mycol = someval AND firstN(mycol) = firstN(someval))", > as well as update some SQL generators... > That's why I'd be interested to know if there are any index types available which store only the first N chars or use > some highly compressed form for storing index data, and then recheck any potential hits against the main table. And if > something like that does not exist yet, how difficult would it be to construct such a solution out of many "spare parts" > that come with PG? I think you could do it with GiST. Look at contrib/btree_gist, and make a variant version of its text support in which only the first N characters are stored/compared, then declare all the operators as RECHECK. (Actually, I'm not sure that "<" etc would work correctly in such a scenario, especially in non-C locales; it might be best to declare the operator class as containing only "=".) I don't think you can do it in btree or hash because they assume that the operators are strictly consistent with the support functions, and in fact apply the operators directly in some code paths. So you couldn't use true text equality as the "=" operator, which ruins your chances of not changing your queries. But GiST never touches the operators at all, just the support procs, and you can make the support procs do anything. So it should be possible to make the index work with just the first N characters, and then the RECHECK with true equality would filter out the wrong matches. BTW, if you get something that works well, stick it up on pgfoundry; you're not the first to have asked this ... regards, tom lane