On Mon, Mar 23, 2020 at 02:35:48PM +0100, Laurenz Albe wrote: > On Fri, 2020-03-20 at 22:50 +0100, Kurt Roeckx wrote: > > I have a few tables that are append only. Thre are only gets insert > > and select queries, never update or delete. > > > > What I see is that every file is still being updated. It's > > currently about 500 GB big, and every of that almost 500 files has > > been touched the past 24 hours. > > > > I assume that that the free space map is being used, and that it > > still finds places where it can insert a row in one of the files. > > > > (auto) vacuum is not happening on the table. > > This is probably the first reader setting hint bits on the table rows. > > To determine whether a row is visible or not, the first reader has > to consult the commit log to see if the xmin and xmax special columns > of the row belong to committed transactions or not. > > To make life easier for future readers, it will then set special > flags on the row that provide that information without the requirement > to consult the commit log. > > This modifies the row, even if the data don't change, and the row > has to be written again. > > > Is there a way I can turn off this behaviour, and that it really > > only writes to the last few pages? > > You can explicitly read or vacuum the new rows, that will set the > hint bits. > > But, as has been explained, at some point the table will have to receive > an anti-wraparound vacuum that will freeze old rows. > > So the best you can do is to VACUUM (FREEZE) the table after you load > data. Then the table should not be modified any more. I did a normal vacuum, and it seems to be behaving better, it's not writing all over the old files anymore. I think I'll set autovacuum_freeze_max_age a lot lower than the default 200 M. Note that this is not a static table, it will always be adding more rows. The behaviour I'm now still seeing is that it's reading the table all over during import of new data. I assume that also caused the writes before. I would really like to avoid all the random reads, but I'm not sure it can use the data from the index to avoid needing to read the datafile itself. My table looks like this: Table "public.raw_certificates" Column | Type | Collation | Nullable | Default --------------------+---------+-----------+----------+--------------------------------------------- certificate | bytea | | not null | id | bigint | | not null | nextval('raw_certificate_id_seq'::regclass) sha256_fingerprint | bytea | | not null | pre_certificate | boolean | | not null | Indexes: "raw_certificates_pkey" PRIMARY KEY, btree (id) "raw_certificates_sha256_fingerprint_key" UNIQUE CONSTRAINT, btree (sha256_fingerprint) Referenced by: TABLE "certificates" CONSTRAINT "certificates_raw_certificate_id_fkey" FOREIGN KEY (raw_certificate_id) REFERENCES raw_certificates(id) TABLE "ct_entry" CONSTRAINT "ct_entry_raw_certificate_id_fkey" FOREIGN KEY (raw_certificate_id) REFERENCES raw_certificates(id) To import data into it, I currently do: CREATE TEMP TABLE import_certs (certificate bytea not null, sha256_fingerprint bytea) COPY import_certs (certificate) FROM stdin update import_certs set sha256_fingerprint = digest(certificate, 'sha256') insert into raw_certificates (sha256_fingerprint, certificate, pre_certificate) select i.sha256_fingerprint, i.certificate, false from import_certs as i on conflict do nothing The behaviour I currently see is: - It's doing a read from a raw_certificates_sha256_fingerprint_key file, then from a raw_certificates file, then again from raw_certificates_sha256_fingerprint_key, repeating this about 5 times. - Then it does a write and a read to the import_certs table. I guess that after reading from the index, it needs to check the table that it's still visible or something. There isn't a way to avoid this? The write to the import_certs is very confusing to me. Anyway, the major reason for the sha256_fingerprint field is just to remove the duplicates. I would only like to have 1 copy of each certificate in that table. Does anybody have a suggestion on how to improve the performance? Once I catch up with all the old data again, I expect this table alone to be in the order of 10 TB, and grow at around 250 GB / month. And I think I need to start to consider moving it to SSDs to keep up. Kurt