Thorsten: On Wed, Feb 10, 2021 at 12:36 PM Thorsten Schöning <tschoening@xxxxxxxxxx> wrote: > Guten Tag Francisco Olarte, > am Mittwoch, 10. Februar 2021 um 10:38 schrieben Sie: > > [...]and partitioning sorts partially > > by date ( or fully if you have made the partitions by range-querying > > via index scan ).[...] > That statement is especially interesting not only because of my > dump-size, but I'm running into the problem that queries spanning more > than one partition seem to prefer sequential scan over using indexes. > My indexe seems to only be used when querying the rows of one > partition. You know the drill, show your indexes, post some explain for these, those statements are too fuzzy to infer any useful advice. > So, the following is my definition, should that be "range-queried via > index scan" properly? :-) > > CREATE TABLE datagram > > ( > > id bigserial NOT NULL, > > src_re integer NOT NULL, > > src_clt integer NOT NULL, > > src_meter integer NOT NULL, > > captured_at timestamp with time zone NOT NULL, > > captured_rssi smallint NOT NULL, > > oms_status smallint NOT NULL, > > oms_enc bytea, > > oms_dec bytea > > ) PARTITION BY RANGE (captured_at); > > CREATE TABLE datagram_y1970_h1 PARTITION OF datagram FOR VALUES FROM ('1970-01-01') TO ('1970-07-01'); > > CREATE TABLE datagram_y1970_h2 PARTITION OF datagram FOR VALUES FROM ('1970-07-01') TO ('1971-01-01'); > > [...] Given that definition there is no possible index scan, as there are no indexes. > > INSERT INTO datagram([...]) SELECT * FROM datagram_unpart; But given this what I was suggesting is not true. What I meant is that if you had big datagram_unpart table indexed by captured_at with good index correlation, which given the name of the field would be expected, and populated the partitions one by one by doing range queries over datagram unpart you may have hit index scans on each population query and end up with perfectly sorted partitions, and that could help compressions. Given your query, unless the optimizer is performing some supernatural ( for me ) trickery my bet would be on a full table scan plus insertion which would left you with whatever order you had originally for each partition. But if your data, or your "data shape" has some natural correlation with captured timestamps ( i.e., I know that my CDR data has it ), puttig similar data together could have improved your compression ratios. Bear in mind I do not know it. Anyway, I would not worry too much about the backup ratios with the sizes you have, I would worry more on doing things like augmenting fill factors and similar things if your data processing is similar to mine ( data enters at time captured_at, sometimes is fixed due to errors but normally never moves after being in the tables for a couple of months, and when it does it is a special problems which can be manually tuned ). It will not improve the backup too much, but can improve other things. i.e., in my big CDR ( that means call detail record, telephony ) tables I let them insert into "normal" monthly partitions, but when they are some month olds I coalesce them into fully sorted, unindexed, 100% filled yearly partitions. Those partitions are then put into an "historic" schema which I do NOT backup automatically, I only do it after the ( manually fired ) coalescing and the ( manually done ) very ocasional long term fixups ( I rarely have to touch nothing older than a month ). Regards. Francisco Olarte.