Hi Daniel: On Mon, May 25, 2015 at 10:03 PM, Daniel Begin <jfd553@xxxxxxxxxxx> wrote: ... > Even after doing all this, I did not find any improvement in execution times between my original fat table and the partitioned version (sometime even worst). If partitioning the table has improved significantly queries running times, I could have partitioned the tables differently to accommodate other query types I will have to run later in my research (I have the same problem for half a dozen tables). > Since it does not seem that partitioning will do the job, I will get back to the original table to run my queries... Well, at least you've learnt some things about it and you can expect to repeat the measurements faster shoudl you need it. > However, just in case someone knows a magical trick that can improve significantly the speed of my queries (but haven't told me yet!-) here are the details about the concerned table/indexes ..... > Table size: 369GB > Indexes size: 425GB > I am running all this on my personal PC: Windows 64b, i7 chip, 16GB ram. Supposing you can dedicate about 12 Gb to shared buffers / caches, your caches are going to get trashed often with real work, that is why we recommended repeating the queries. Anyway, one last remark. Partition is not magic, it helps a lot depending on the access records. For workloads like mine they help a lot ( call records, where I insert frequently ( so position correlates strongly with indexes ), nearly never update ( and I work with high fill factors ), and query frequently for unindexed conditions plus partition-related ranges they work great ( a big table forces index scans, which due to correlation are fast, but indexed anyways, plus filters on extra conditions, partitions normally go to sequential partition scans plus filters, and sequential scans are way faster, plus the normal queries go normally to the last things inserted, so partitions help to keep them cached ). For queries like the ones you've timed/shown ( partition on an integer PK of unknown origin, queries for single values ( IN queries are normally just several single repeated ) or small ranges, big table is normally gonna beat partition hands down ( except if you have locality, like you are inserting serials and querying frequently in the vicinity of the inserted ones, in this case partitions keeps used tables small and cacheable and may give you a small edge ). > I am using PostgreSQL 9.3 and the database cluster is spread over 2X3TB external drives with write caching. Well, from your last measurements it seems your disk systems is awful for database work. I do not know what you mean by external drives ( eSata? firewire? Usb2? usb3? also, any kind of volume management ) but in your fist query: > db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id IN(10005000,1000005000,2000005000,3000005000); > Index Scan using nodes_idversion_pk on old_nodes (cost=0.71..17739.18 rows=6726 width=66) (actual time=52.226..288.700 rows=6 loops=1) > Index Cond: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[])) > Total runtime: 288.732 ms > --Second attempt; > db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id IN(10005000,1000005000,2000005000,3000005000); > Index Scan using nodes_idversion_pk on old_nodes (cost=0.71..17739.18 rows=6726 width=66) (actual time=0.014..0.035 rows=6 loops=1) > Index Cond: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[])) > Total runtime: 0.056 ms 288 ms for a query which should do 8-10 disk reads seems too slow to me. And you get nearly the same on the second cache. I would normally expect <100ms for any reasonable drive, and <50 for any db tuned disks. I do not remember the exact parameters, but if your samples reprensent your future workload you need to tune well for disk access time. It's specially noticeable in the last example ( query with ID from a subquery ), where you got the times: > --Select ids------------------------------------------------------------------------------------------------------------------------------------------------- > --Explain analyse on original table for a query that will look into one partition on the new table but list of ids provided through a select statement > --First attempt; > Total runtime: 2290.122 ms > --Second attempt; > Total runtime: 26.005 ms Warm caches help you a lot here..... > --Explain analyse on partitioned table for a query that will look into one partition but list of ids provided through a select statement > --First attempt; > Total runtime: 19142.983 ms > --Second attempt; > Total runtime: 1383.929 ms And here too, and also, as parition means always hitting more blocks ( see it in reverse, if you collapse partitions maybe some data ends in the same block and you save some reads, or not, but you will never be worse ), slow disks hurt you more. One last remark. I have not seen your data, but from what you've told and the shown queries I would go for the single table approach hands down ( maybe with an intermediate weekend / nigt time cluster/vacuum full/analyze if it is insert a lot- select a lot, rinse, repeat ) unless you are appending / updating / deleting a lot. You are in a case ( big data, medium machine, slow disks ) where the real access patterns are what is going to determine your strategy, no boilerplate solution is going to apply there. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general