> Hi all, > > i am trying to move my app from M$sql to PGsql, but i need a bit of help > :) Except from all the other good advises about partitioning the dataset and such there is another aspect to "keep in mind". When you have a large dataset and your queries become "IO-bound" the "tuple density" is going to hit you in 2 ways. Your dataset seems to have a natural clustering around the time, which is also what you would use for the partitioning. That also means that if you sort of have the clustering of data on disk you would have the tuples you need to satisfy a query on the "same page" or pages "close to". The cost of checking visibillity for a tuple is to some degree a function of the "tuple size", so if you can do anything to increase the tuple density that will most likely benefit speed in two ways: * You increace the likelyhood that the next tuple was in the same page and then dont result in a random I/O seek. * You increace the total amount of tuples you have sitting in your system cache in the same amount of pages (memory) so they dont result in a random I/O seek. So .. if you are carrying around columns you "dont really need", then throw them away. (that could be colums that trivially can be computed bases on other colums), but you need to do your own testing here. To stress the first point theres a sample run on a fairly old desktop with one SATA drive. testtable has the "id integer" and a "data" which is 486 bytes of text. testtable2 has the "id integer" and a data integer. both filled with 10M tuples and PG restarted and rand drop caches before to simulate "totally disk bound system". testdb=# select count(id) from testtable where id > 8000000 and id < 8500000; count -------- 499999 (1 row) Time: 7909.464 ms testdb=# select count(id) from testtable2 where id > 8000000 and id < 8500000; count -------- 499999 (1 row) Time: 2149.509 ms In this sample.. 4 times faster, the query does not touch the "data" column. (on better hardware you'll most likely see better results). If the columns are needed, you can push less frequently used columns to a 1:1 relation.. but that gives you some administrative overhead, but then you can desice at query time if you want the extra random seeks to access data. You have the same picture the "other way around" if your queries are accession data sitting in TOAST, you'll be paying "double random IO"-cost for getting the tuple. So it is definately a tradeoff, that should be done with care. I've monkeypatched my own PG using this patch to toy around with criteria to send the "less frequently used data" to a TOAST table. http://article.gmane.org/gmane.comp.db.postgresql.devel.general/135158/match= Google "vertical partition" for more, this is basically what it is. (I belive this could benefit my own application, so I'm also trying to push some interest into the area). -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance