Thorsten: On Wed, Feb 10, 2021 at 9:58 AM Thorsten Schöning <tschoening@xxxxxxxxxx> wrote: ... > I've changed the new table "datagram" to be a partitioned one with > partitions containing rows per year and per half-year. Each partition > contains far less rows than before this way and while I only tested > with "--compress=9" this time, the numbers are quite interesting: > > > unpartitioned: 6,4 GiB > > half-yearly parts: 4,8 GiB > > yearly parts: 4,8 GiB > > The interesting part this time is that the table layout for all > partitions is the same like before, only the number of rows per table > is different. Though, the number of rows overall is the same like > before, the same data, IDs etc. Though, this time the dump really is > smaller than with the OLD schema containing far more data because of > duplicate IDs and stuff. > I wouldn't have expected table layout to be that important. Compresion is dependent on detectable redundancy on the input. pg_dump more or less gzips per-table "copy to stdout" dumps. If your data evolves in a determined way having it sorted by ingestion time may increase detectable redundancy a lot, and partitioning sorts partially by date ( or fully if you have made the partitions by range-querying via index scan ). In this case it may not be the layout, but the order. Given you seem to be able to test, you may try sorting the full table by the column you use for partitioning. IIRC cluster will do the trick if it is indexed. ( This has happened to me compressing document dumps, presorting by some chosen fields improved my compression ratio a bit, IIRC it was due to the compressor searching for duplicates on a limited window only, this is why I use "detectable" redundancy ) Francisco Olarte.