Joshua D. Drake wrote: > Alvaro Herrera wrote: > > Darcy Buskermolen wrote: > >> On October 31, 2006 08:53 am, Teodor Sigaev wrote: > >>>> The problem as I remember it is pg_tgrm not tsearch2 directly, I've sent > >>>> a self contained test case directly to Teodor which shows the error. > >>>> > >>>> 'ERROR: index row requires 8792 bytes, maximum size is 8191' > >>> Uh, I see. But I'm really surprised why do you use pg_trgm on big text? > >>> pg_trgm is designed to find similar words and use technique known as > >>> trigrams. This will work good on small pieces of text such as words or set > >>> expression. But all big texts (on the same language) will be similar :(. > >>> So, I didn't take care about guarantee that index tuple's size limitation. > >>> In principle, it's possible to modify pg_trgm to have such guarantee, but > >>> index becomes lossy - all tuples gotten from index should be checked by > >>> table's tuple evaluation. > >> The problem is some of the data we are working with is not strictly "text" but > >> bytea that we've run through encode(bytea, 'escape'), > > > > I think one good question is why are you storing bytea and then > > searching like it were text. > > We are not storing bytea, a customer is. We are trying to work around > customer requirements. The data that is being stored is not always text, > sometimes it is binary (a flash file or jpeg). We are using escaped text > to be able to search the string contents of that file . Hmm, have you tried to create a functional trigram index on the equivalent of "strings(bytea_column)" or something like that? I imagine strings(bytea) would be a function that returns the concatenation of all pure (7 bit) ASCII strings in the byte sequence. On the other hand, based on Teodor's comment on pg_trgm, maybe this won't be possible at all. > > Why not store the text as text, and put > > the extraneous bytes somewhere else? Certainly you wouldn't expect to > > be able to find text among the bytes, would you? > > Yes we do (and can) expect to find text among the bytes. We have > searches running, we are just running into the maximum size issues for > certain rows. Do you mean you actually find stuff based on text attributes in JPEG images and the like? I thought those were compressed ... -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.