On Wed, Jul 28, 2010 at 02:05:47PM -0500, P Kishor wrote: > each row is half a dozen single byte values, so, it is actually 6 > bytes per row (six columns). Even if I combine them somehow, still the > per row overhead (which, I believe, is about 23 bytes) is more than > the data. But, that is not the issue. I had a design like that for an application too. I thought it was not an issue, but the row overhead causes memory and disk usage to skyrocket, and will cause queries to slow down to a grind. The solution for me was to group my values logically together and store them in the same row somehow. In my case, this worked by storing all the values for one measuring point (timestamp) in an array field, with the array indices being stored in a bookkeeping table (each measuring moment produced the same number of values for me, so I was able to do this). Extracting one value from a long array (some datasets include thousands of values per measuring moment) is extremely fast. You can also easily make indices on those array dereferences you need to search on, if those are always the same. > First, I can't really merge > several days into one row. While it might make for fewer rows, it will > complicate my data extraction and analysis life very complicated. Perhaps you could put all days of a month in an array, indexed by day of the month? That wouldn't be too hard for your logic to deal with, I think. > The real issue is that once I put a 100 million rows in the table, > basically the queries became way too slow. I had the same issue. Partitioning falls flat on its face once you're dealing with such insane amounts of data. In my experience if your partitions aren't constant and will keep growing, you will face problems sooner or later. If you do partitioning the traditional way by inheriting the table, you'll also run into additional trouble since for some operations Postgres will need to obtain a handle on all partitions and that will easily cause you to run out of shared memory. You can increase max_locks_per_transaction, but that's undoable if the number of partitions keeps growing. You need to keep increasing that value all the time... > Of course, I could (and should) upgrade my hardware -- I am using a > dual Xeon 3 GHz server with 12 GB RAM, but there are limits to that route. Always try to solve it by changing your data design first, unless what you're trying to do is fundamentally limited by hardware. You're not likely going to request all those record at once, nor will you need to search through all of them; try to come up with a sane way of quickly slicing your data to a smaller set which can be quickly retrieved. > Keep in mind, the circa 100 million rows was for only part of the db. > If I were to build the entire db, I would have about 4 billion rows > for a year, if I were to partition the db by years. And, partitioning > by days resulted in too many tables. Yeah, sounds similar to the troubles I ran into in my project. > I wish there were a way around all this so I could use Pg, with my > available resources, but it looks bleak right now. Try using the array approach. Possibly you could create columns for each week or month in a year and store the individual days in an array in that column. Extracting those shouldn't be too hard. You could store the different types of data you have in different rows for each unit of information you want to store for a day. Alternatively, store your data points all in one row, and store a row for each day. You could easily start partitioning historical data per year or per decade. Cheers, Peter -- http://sjamaan.ath.cx -- "The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music." -- Donald Knuth -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general