Re: Large rows number, and large objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

Thanks (you both, Samuel and Craig) for your answers!

On Sun, Jun 19, 2011 at 11:19 AM, Craig James
<craig_james@xxxxxxxxxxxxxx> wrote:
> On 6/19/11 4:37 AM, Samuel Gendler wrote:
>
> On Sat, Jun 18, 2011 at 9:06 PM, Jose Ildefonso Camargo Tolosa
> <ildefonso.camargo@xxxxxxxxx> wrote:
>>
>> Greetings,
>>
>> I have been thinking a lot about pgsql performance when it is dealing
>> with tables with lots of rows on one table (several millions, maybe
>> thousands of millions).  Say, the Large Object use case:
>>
>> one table has large objects (have a pointer to one object).
>> The large object table stores the large object in 2000 bytes chunks
>> (iirc), so, if we have something like 1TB of data stored in large
>> objects, the large objects table would have something like 550M rows,
>> if we get to 8TB, we will have 4400M rows (or so).
>>
>> I have read at several places that huge tables should be partitioned,
>> to improve performance.... now, my first question comes: does the
>> large objects system automatically partitions itself? if no: will
>> Large Objects system performance degrade as we add more data? (I guess
>> it would).
>
> You should consider "partitioning" your data in a different way: Separate
> the relational/searchable data from the bulk data that is merely being
> stored.
>
> Relational databases are just that: relational.  The thing they do well is
> to store relationships between various objects, and they are very good at
> finding objects using relational queries and logical operators.
>
> But when it comes to storing bulk data, a relational database is no better
> than a file system.
>
> In our system, each "object" is represented by a big text object of a few
> kilobytes.  Searching that text file is essential useless -- the only reason
> it's there is for visualization and to pass on to other applications.  So
> it's separated out into its own table, which only has the text record and a
> primary key.

Well, my original schema does exactly that (I mimic the LO schema):

files (searchable): id, name, size, hash, mime_type, number_chunks
files_chunks : id, file_id, hash, chunk_number, data (bytea)

So, my bulk data is on files_chunks table, but due that data is
restricted (by me) to 2000 bytes, the total number of rows on the
files_chunks table can get *huge*.

So, system would search the files table, and then, search the
files_chunks table (to get each of the chunks, and, maybe, send them
out to the web client).

So, with a prospect of ~4500M rows for that table, I really thought it
could be a good idea to partition files_chunks table.  Due that I'm
thinking on relatively small files (<100MB), table partitioning should
do great here, because I could manage to make all of the chunks for a
table  to be contained on the same table.  Now, even if the system
were to get larger files (>5GB), this approach should still work.

The original question was about Large Objects, and partitioning...
see, according to documentation:
http://www.postgresql.org/docs/9.0/static/lo-intro.html

"All large objects are placed in a single system table called pg_largeobject."

So, the question is, if I were to store 8TB worth of data into large
objects system, it would actually make the pg_largeobject table slow,
unless it was automatically partitioned.

Thanks for taking the time to discuss this matter with me!

Sincerely,

Ildefonso Camargo

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux