On Sun, 2020-09-13 at 13:49 +0200, Mike Noordermeer wrote: > Hi, > > I am currently working on a data sync solution, where data is synced > from external systems to Postgres. As I do not know what data changed > in the source data store, I have opted for the following flow: > > - Load all data in a temp table > - Update existing rows main table based on contents of temp table > - Insert new rows main table > - Delete obsolete rows main table > > Obviously, I want to minimize WAL logging and data writes. Therefore I > have added suppress_redundant_updates_trigger to the main table. > > Nevertheless, I am still seeing pages being dirtied and written to the > WAL, even if no rows changed. Is there anyone that could explain why > this is happening? And does anyone know of better ways to minimize > writes in such a 'data load/refresh' scenario, where data is the same > in 99.9% of the cases? > > Small test case on Postgres 12.4 below, as you can see it dirties 8334 blocksk: > > testdatabase=# create table testtable (id int not null, name > varchar(32), primary key(id)); > CREATE TABLE > testdatabase=# create temp table testtable_temp (id int not null, name > varchar(32), primary key(id)); > CREATE TABLE > testdatabase=# insert into testtable (id, name) select i, md5(i::text) > from generate_series(1,1000000) s(i); > INSERT 0 1000000 > testdatabase=# insert into testtable_temp select * from testtable; > INSERT 0 1000000 > testdatabase=# create trigger z_min_update before update on testtable > for each row execute procedure suppress_redundant_updates_trigger(); > CREATE TRIGGER > testdatabase=# checkpoint; > CHECKPOINT > testdatabase=# explain (analyze, buffers) update testtable as d set > name = s.name from testtable_temp as s where d.id = s.id; > > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------------------- > Update on testtable d (cost=30315.56..69708.58 rows=591714 width=98) > (actual time=4168.901..4168.903 rows=0 loops=1) > Buffers: shared hit=1008337 dirtied=8334, local read=8334 > dirtied=8334 written=8331, temp read=11144 written=11144 > -> Hash Join (cost=30315.56..69708.58 rows=591714 width=98) > (actual time=486.771..1429.637 rows=1000000 loops=1) > Hash Cond: (d.id = s.id) > Buffers: shared hit=8337, local read=8334 dirtied=8334 > written=8331, temp read=11144 written=11144 > -> Seq Scan on testtable d (cost=0.00..18334.00 > rows=1000000 width=10) (actual time=0.026..157.729 rows=1000000 > loops=1) > Buffers: shared hit=8334 > -> Hash (cost=14251.14..14251.14 rows=591714 width=92) > (actual time=486.128..486.129 rows=1000000 loops=1) > Buckets: 32768 Batches: 32 Memory Usage: 2583kB > Buffers: local read=8334 dirtied=8334 written=8331, > temp written=7549 > -> Seq Scan on testtable_temp s (cost=0.00..14251.14 > rows=591714 width=92) (actual time=0.035..226.642 rows=1000000 > loops=1) > Buffers: local read=8334 dirtied=8334 written=8331 > Planning Time: 0.429 ms > Trigger z_min_update: time=57.069 calls=1000000 > Execution Time: 4174.785 ms > (15 rows) These are probably the "hint bits" set on newly committed rows by the first reader. Note that te blocks are dirtied during the sequential scan, not during the update. You could try VACUUMing the tables before the update (which will set hint bits) and see if you still get the dirtied blocks. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com