Re: FW: Constraint exclusion in partitions

Francisco Olarte <folarte@xxxxxxxxxxxxxx> · Tue, 26 May 2015 08:42:00 +0200

Hi Daniel:

On Mon, May 25, 2015 at 10:03 PM, Daniel Begin <jfd553@xxxxxxxxxxx> wrote:
...
> Even after doing all this, I did not find any improvement in execution times between my original fat table and the partitioned version (sometime even worst). If partitioning the table has improved significantly queries running times, I could have partitioned the tables differently to accommodate other query types I will have to run later in my research (I have the same problem for half a dozen tables).
> Since it does not seem that partitioning will do the job, I will get back to the original table to run my queries...

Well, at least you've learnt some things about it and you can expect
to repeat the measurements faster shoudl you need it.

> However, just in case someone knows a magical trick that can improve significantly the speed of my queries (but haven't told me yet!-) here are the details about the concerned table/indexes
.....
> Table size: 369GB
> Indexes size: 425GB
> I am running all this on my personal PC:  Windows 64b, i7 chip, 16GB ram.

Supposing you can dedicate about 12 Gb to shared buffers / caches,
your caches are going to get trashed often with real work, that is why
we recommended repeating the queries.

Anyway, one last remark. Partition is not magic, it helps a lot
depending on the access records. For workloads like mine they help a
lot  ( call records, where I insert frequently ( so position
correlates strongly with indexes ), nearly never update ( and I work
with high fill factors ), and query frequently for unindexed
conditions plus partition-related ranges they work great ( a big table
forces index scans, which due to correlation are fast, but indexed
anyways, plus filters on extra conditions, partitions normally go to
sequential partition scans plus filters, and sequential scans are way
faster, plus the normal queries go normally to the last things
inserted, so partitions help to keep them cached ).

For queries like the ones you've timed/shown ( partition on an integer
PK of unknown origin, queries for single values ( IN queries are
normally just several single repeated ) or small ranges, big table is
normally gonna beat partition hands down ( except if you have
locality, like you are inserting serials and querying frequently in
the vicinity of the inserted ones, in this case partitions keeps used
tables small and cacheable and may give you a small edge ).

> I am using PostgreSQL 9.3 and the database cluster is spread over 2X3TB external drives with write caching.

Well, from your last measurements it seems your disk systems is awful
for database work. I do not know what you mean by external drives (
eSata? firewire? Usb2? usb3? also, any kind of volume management ) but
in your fist query:

> db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id IN(10005000,1000005000,2000005000,3000005000);
>  Index Scan using nodes_idversion_pk on old_nodes  (cost=0.71..17739.18 rows=6726 width=66) (actual time=52.226..288.700 rows=6 loops=1)
>    Index Cond: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
>  Total runtime: 288.732 ms
> --Second attempt;
> db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id IN(10005000,1000005000,2000005000,3000005000);
>  Index Scan using nodes_idversion_pk on old_nodes  (cost=0.71..17739.18 rows=6726 width=66) (actual time=0.014..0.035 rows=6 loops=1)
>    Index Cond: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
>  Total runtime: 0.056 ms

288 ms for a query which should do 8-10 disk reads seems too slow to
me. And you get nearly the same on the second cache. I would normally
expect <100ms for any reasonable drive, and <50 for any db tuned
disks. I do not remember the exact parameters, but if your samples
reprensent your future workload you need to tune well for disk access
time. It's specially noticeable in the last example ( query with ID
from a subquery ), where you got the times:

> --Select ids-------------------------------------------------------------------------------------------------------------------------------------------------
> --Explain analyse on original table for a query that will look into one partition on the new table but list of ids provided through a select statement
> --First attempt;
>  Total runtime: 2290.122 ms
> --Second attempt;
>  Total runtime: 26.005 ms

Warm caches help you a lot here.....

> --Explain analyse on partitioned table for a query that will look into one partition but list of ids provided through a select statement
> --First attempt;
>  Total runtime: 19142.983 ms
> --Second attempt;
>  Total runtime: 1383.929 ms

And here too, and also, as parition means always hitting more blocks (
see it in reverse, if you collapse partitions maybe some data ends in
the same block and you save some reads, or not, but you will never be
worse ), slow disks hurt you more.

One last remark. I have not seen your data, but from what you've told
and the shown queries I would go for the single table approach hands
down ( maybe with an intermediate weekend / nigt time cluster/vacuum
full/analyze if it is insert a lot- select a lot, rinse, repeat )
unless you are appending / updating / deleting a lot. You are in a
case ( big data, medium machine, slow disks ) where the real access
patterns are what is going to determine your strategy, no boilerplate
solution is going to apply there.

Francisco Olarte.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general