Thanks again for the reply.
I suspect casting and using octet_length() is not accurate.
Using "extract[ed] text" keyword or summaries would indeed be quick but is not what I'm looking for.
I am inquiring about real-world numbers for full text search of large documents, I'm not sure what more detail you could want.
I'm not demanding anything, just using examples to clarify my inquiry.
I am inded open to alternatives.
Thank you Kevin, pg_column_size looks like it's exactly what I'm looking for.
pg_column_size(any) int Number of bytes used to store a particular value (possibly compressed)
On Tue, Jun 14, 2011 at 11:36 AM, Kevin Grittner <Kevin.Grittner@xxxxxxxxxxxx> wrote:
> I would be surprised if there is no general "how big is thisYou could cast to text and use octet_length().
> object" method in PostgreSQL.
Well, I suggested that storing a series of novels as a single entry
> If it's "bad design" to store large text documents (pdf,docx,etc)
> as a BLOBs or on a filesystem and make them searchable with
> tsvectors can you suggest a good design?
seemed bad design to me. Perhaps one entry per novel or even finer
granularity would make more sense in most applications, but there
could be exceptions. Likewise, a list of distinct words is of
dubious value in most applications' text searches. We extract text
from court documents and store a tsvector for each document; we
don't aggregate all court documents for a year and create a
tsvector for that -- that would not be useful for us.
I remember you asking about doing that, but I don't think anyone
> If making your own search implementation is "better" what is the
> point of tsvectors?
else has advocated it.
If you were to ask for real-world numbers you'd probably get farther
> Maybe I'm missing something here?
than demanding that people volunteer their time to perform tests
that you define but don't seem willing to run. Or if you describe
your use case in more detail, with questions about alternative
approaches, you're likely to get useful advice.
-Kevin
On Tue, Jun 14, 2011 at 11:44 AM, Kevin Grittner <Kevin.Grittner@xxxxxxxxxxxx> wrote:
> You could cast to text and use octet_length().Or perhaps you're looking for pg_column_size().
http://www.postgresql.org/docs/9.0/interactive/functions-admin.html#FUNCTIONS-ADMIN-DBSIZE
-Kevin