Hi! Thanks (you both, Samuel and Craig) for your answers! On Sun, Jun 19, 2011 at 11:19 AM, Craig James <craig_james@xxxxxxxxxxxxxx> wrote: > On 6/19/11 4:37 AM, Samuel Gendler wrote: > > On Sat, Jun 18, 2011 at 9:06 PM, Jose Ildefonso Camargo Tolosa > <ildefonso.camargo@xxxxxxxxx> wrote: >> >> Greetings, >> >> I have been thinking a lot about pgsql performance when it is dealing >> with tables with lots of rows on one table (several millions, maybe >> thousands of millions). Say, the Large Object use case: >> >> one table has large objects (have a pointer to one object). >> The large object table stores the large object in 2000 bytes chunks >> (iirc), so, if we have something like 1TB of data stored in large >> objects, the large objects table would have something like 550M rows, >> if we get to 8TB, we will have 4400M rows (or so). >> >> I have read at several places that huge tables should be partitioned, >> to improve performance.... now, my first question comes: does the >> large objects system automatically partitions itself? if no: will >> Large Objects system performance degrade as we add more data? (I guess >> it would). > > You should consider "partitioning" your data in a different way: Separate > the relational/searchable data from the bulk data that is merely being > stored. > > Relational databases are just that: relational. The thing they do well is > to store relationships between various objects, and they are very good at > finding objects using relational queries and logical operators. > > But when it comes to storing bulk data, a relational database is no better > than a file system. > > In our system, each "object" is represented by a big text object of a few > kilobytes. Searching that text file is essential useless -- the only reason > it's there is for visualization and to pass on to other applications. So > it's separated out into its own table, which only has the text record and a > primary key. Well, my original schema does exactly that (I mimic the LO schema): files (searchable): id, name, size, hash, mime_type, number_chunks files_chunks : id, file_id, hash, chunk_number, data (bytea) So, my bulk data is on files_chunks table, but due that data is restricted (by me) to 2000 bytes, the total number of rows on the files_chunks table can get *huge*. So, system would search the files table, and then, search the files_chunks table (to get each of the chunks, and, maybe, send them out to the web client). So, with a prospect of ~4500M rows for that table, I really thought it could be a good idea to partition files_chunks table. Due that I'm thinking on relatively small files (<100MB), table partitioning should do great here, because I could manage to make all of the chunks for a table to be contained on the same table. Now, even if the system were to get larger files (>5GB), this approach should still work. The original question was about Large Objects, and partitioning... see, according to documentation: http://www.postgresql.org/docs/9.0/static/lo-intro.html "All large objects are placed in a single system table called pg_largeobject." So, the question is, if I were to store 8TB worth of data into large objects system, it would actually make the pg_largeobject table slow, unless it was automatically partitioned. Thanks for taking the time to discuss this matter with me! Sincerely, Ildefonso Camargo -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance