Hi Justin,
>I didn't hear how large the tables and indexes are.
+-------------------------------------------+------------------+--------------------------------------------+
| table_name | pg_relation_size | pg_total_relation_size - pg_relation_size |
+-------------------------------------------+------------------+--------------------------------------------+
| TransactionLog_20200213 | 95646646272 | 4175699968 |
| TransactionLog_20200212 | 95573344256 | 4133617664 |
| TransactionLog_20200211 | 91477336064 | 3956457472 |
| TransactionLog_20200210 | 8192000000 | 354344960 |
| TransactionLog_20200214 | 6826672128 | 295288832 |
| TransactionLog_20200220 | 1081393152 | 89497600 |
| pg_catalogpg_attribute | 3088384 | 2220032 |
| TransactionLog_20190925 | 1368064 | 90112 (174 such partitions) |
+-------------------------------------------+------------------+--------------------------------------------+
> Do you mean that a given query is only going to hit 2 partitions ? Or do you
> mean that all but the most recent 2 partitions are "archival" and won't be
> needed by future queries ?
> needed by future queries ?
Yes all queries will hit only 2 partitions (e.g. if we do daily partition, queries will hit only today's and yesterday's partition).
> You should determine what an acceptable planning speed is, or the best balance
> of planning/execution time. Try to detach half your current partitions and, if
> that gives acceptable performance, then partition by day/2 or more. You could
> make a graph of (planning and total) time vs npartitions, since I think it's
> likely to be nonlinear.
> I believe others have reported improved performance under v11 with larger
> numbers of partitions, by using "partitions of partitions". So you could try
> making partitions by month themselves partitioned by day.
FYI, these are the observations I am getting with various number of partition and a multilevel partition with respect to Un-Partitioned.
+---------------+----------------------+----------------+---------------+--------------+---------------+--------------+---------------+--------------+
| Testcase | Partition Count | Records in | Select | Select | Update | Update | insert | insert |
| | | each Partition | planning (ms) | execute (ms) | planning (ms) | execute (ms) | planning (ms) | execute (ms) |
+---------------+----------------------+----------------+---------------+--------------+---------------+--------------+---------------+--------------+
| Single Level | 6 | 1000 | 1.162 | 0.045 | 2.112 | 0.115 | 1.261 | 0.178 |
| Partition | 30 | 1000 | 2.879 | 0.049 | 5.146 | 0.13 | 1.243 | 0.211 |
| | 200 | 1000 | 18.479 | 0.087 | 31.385 | 0.18 | 1.253 | 0.468 |
+---------------+----------------------+----------------+---------------+--------------+---------------+--------------+---------------+--------------+
| Multi Level | 6 Partition having | 1000 | 3.6032 | 0.0695 | x | x | x | x |
| Partition | 30 subpartition each | | | | | | | |
+---------------+----------------------+----------------+---------------+--------------+---------------+--------------+---------------+--------------+
| UnPartitioned | NA | 430 Million | 0.0875 | 0.0655 | x | x | x | x |
+---------------+----------------------+----------------+---------------+--------------+---------------+--------------+---------------+--------------+
> If you care about INSERT performance, you probably need to make at least a
> single partition's index fit within shared_buffers (or set shared_buffers such
> that it fits). Use transactions around your inserts. If your speed is not
> limited by I/O, you could further use multiple VALUES(),() inserts, or maybe
> prepared statements. Maybe synchronous_commit=off.
>
> If you care about (consistent) SELECT performance, you should consider
> VACUUMing the tables after bulk inserts, to set hint bits (and since
> non-updated tuples won't be hit by autovacuum). Or maybe VACUUM FREEZE to
> freeze tuples (since it sounds like a typical page is unlikely to ever be
> updated).
Sure, I'll evaluate these settings, thanks.
Thanks and Regards,
Ravi Garg
On Sunday, 23 February, 2020, 08:40:58 pm IST, Justin Pryzby <pryzby@xxxxxxxxxxxxx> wrote:
On Sun, Feb 23, 2020 at 10:57:29AM +0000, Ravi Garg wrote:
> - Currently we are thinking to have Daily partitions and as we need to keep 6 months of data thus 180 Partitions.However we have liberty to reduce the number of partitions to weekly/fortnightly/monthly, If we get comparable performance.
I didn't hear how large the tables and indexes are.
> - We need to look current partition and previous partition for all of our use-cases/queries.
Do you mean that a given query is only going to hit 2 partitions ? Or do you
mean that all but the most recent 2 partitions are "archival" and won't be
needed by future queries ?
> Can you please suggest what sort of combinations/partition strategy we can test considering data-volume/vacuum etc. Also let me know if some of the pg_settings can help us tuning this (I have attached my pg_settings).
You should determine what an acceptable planning speed is, or the best balance
of planning/execution time. Try to detach half your current partitions and, if
that gives acceptable performance, then partition by day/2 or more. You could
make a graph of (planning and total) time vs npartitions, since I think it's
likely to be nonlinear.
I believe others have reported improved performance under v11 with larger
numbers of partitions, by using "partitions of partitions". So you could try
making partitions by month themselves partitioned by day.
> - Our use case is limited to simple selects (we don't join with the other
> tables) however, we are expecting ~70 million records inserted per day
> and there would be couple of updates on each records where average record
> size would be ~ 1.5 KB.
> shared_buffers | 1048576
If you care about INSERT performance, you probably need to make at least a
single partition's index fit within shared_buffers (or set shared_buffers such
that it fits). Use transactions around your inserts. If your speed is not
limited by I/O, you could further use multiple VALUES(),() inserts, or maybe
prepared statements. Maybe synchronous_commit=off.
If you care about (consistent) SELECT performance, you should consider
VACUUMing the tables after bulk inserts, to set hint bits (and since
non-updated tuples won't be hit by autovacuum). Or maybe VACUUM FREEZE to
freeze tuples (since it sounds like a typical page is unlikely to ever be
updated).
--
Justin
> - Currently we are thinking to have Daily partitions and as we need to keep 6 months of data thus 180 Partitions.However we have liberty to reduce the number of partitions to weekly/fortnightly/monthly, If we get comparable performance.
I didn't hear how large the tables and indexes are.
> - We need to look current partition and previous partition for all of our use-cases/queries.
Do you mean that a given query is only going to hit 2 partitions ? Or do you
mean that all but the most recent 2 partitions are "archival" and won't be
needed by future queries ?
> Can you please suggest what sort of combinations/partition strategy we can test considering data-volume/vacuum etc. Also let me know if some of the pg_settings can help us tuning this (I have attached my pg_settings).
You should determine what an acceptable planning speed is, or the best balance
of planning/execution time. Try to detach half your current partitions and, if
that gives acceptable performance, then partition by day/2 or more. You could
make a graph of (planning and total) time vs npartitions, since I think it's
likely to be nonlinear.
I believe others have reported improved performance under v11 with larger
numbers of partitions, by using "partitions of partitions". So you could try
making partitions by month themselves partitioned by day.
> - Our use case is limited to simple selects (we don't join with the other
> tables) however, we are expecting ~70 million records inserted per day
> and there would be couple of updates on each records where average record
> size would be ~ 1.5 KB.
> shared_buffers | 1048576
If you care about INSERT performance, you probably need to make at least a
single partition's index fit within shared_buffers (or set shared_buffers such
that it fits). Use transactions around your inserts. If your speed is not
limited by I/O, you could further use multiple VALUES(),() inserts, or maybe
prepared statements. Maybe synchronous_commit=off.
If you care about (consistent) SELECT performance, you should consider
VACUUMing the tables after bulk inserts, to set hint bits (and since
non-updated tuples won't be hit by autovacuum). Or maybe VACUUM FREEZE to
freeze tuples (since it sounds like a typical page is unlikely to ever be
updated).
--
Justin