Milions of views - performance, stability

Hubert Rutkowski <hubert.rutkowski@xxxxxxxxxxxx> · Sat, 17 Sep 2022 01:05:57 +0200

Hello! I have written python program to benchmark view efficiency, because in our platform they have a role to play and we noticed the performance is less than expected.
Basically, benchmark creates table:

CREATE TABLE IF NOT EXISTS foobar ( id int, text varchar(40) );

for i in range(1200300):
    INSERT INTO foobar (id, text) VALUES ({i}, 'some string');
    CREATE VIEW foobar_{i} as select * from foobar where id={i};

Couldn't be any simpler. Postgres 13.1 running in docker, on Ubuntu 20. However, noticed that performance of certain commands is strangely slow:
- dumping through pg_dump to tar took 13 minutes. Same table but without views: less than 1 second.
- restoring through pg_restore took 147 minutes. Same table but without views: half a second.
In other situation (not observed by me) the dumping process of real world db with not only 1.2M empty views but in addition gigabytes of data in rows, lasted for many many hours, and ultimately had to be stopped.

What's even stranger is dropping performance: DROP TABLE foobar CASCADE;. First of all, had to increase locks to allow it to finish, otherwise it was quickly bailing because of "too little shared memory".
    alter system set  max_locks_per_transaction=40000;

But even after that, it took almost 7 hours and crashed:

2022-09-13 23:16:31.113 UTC [1] LOG:  server process (PID 404) was terminated by signal 9: Killed
2022-09-13 23:16:31.113 UTC [1] DETAIL:  Failed process was running: drop table foobar cascade;
2022-09-13 23:16:31.115 UTC [1] LOG:  terminating any other active server processes
2022-09-13 23:16:31.115 UTC [1247] WARNING:  terminating connection because of crash of another server process
2022-09-13 23:16:31.115 UTC [1247] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-09-13 23:16:31.117 UTC [97] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-09-13 23:16:31.136 UTC [1248] FATAL:  the database system is in recovery mode
2022-09-13 23:16:31.147 UTC [1249] FATAL:  the database system is in recovery mode
2022-09-13 23:16:31.192 UTC [1] LOG:  all server processes terminated; reinitializing
2022-09-13 23:16:31.819 UTC [1250] LOG:  database system was interrupted; last known up at 2022-09-13 23:15:47 UTC
2022-09-13 23:16:34.959 UTC [1250] LOG:  database system was not properly shut down; automatic recovery in progress
2022-09-13 23:16:34.965 UTC [1250] LOG:  redo starts at 2/3A3FEEC8
2022-09-13 23:16:36.421 UTC [1250] LOG:  invalid record length at 2/5F355008: wanted 24, got 0
2022-09-13 23:16:36.421 UTC [1250] LOG:  redo done at 2/5F354FD0
2022-09-13 23:16:37.166 UTC [1] LOG:  database system is ready to accept connections

After updating Postgres to 14.5, it crashed in a bit different way:

2022-09-15 19:20:26.000 UTC [67] LOG:  checkpoints are occurring too frequently (23 seconds apart)
2022-09-15 19:20:26.000 UTC [67] HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-09-15 19:20:39.058 UTC [1] LOG:  server process (PID 223) was terminated by signal 9: Killed
2022-09-15 19:20:39.058 UTC [1] DETAIL:  Failed process was running: drop table foobar cascade;

Wihout the views, table can be dropped in 20ms. 

There must be something inherently slow in the way that Postgres manages views. I know that under the hood, views are like table+relation to parent table, so it could be 
compared to having about million of tables. They are not that light, aren't they? 
Probably the issue is made worse because of atomicity: dropping the foobar table with cascade needs to have all views dropped first, in transaction. But why handling them would be so slow? 

Assuming the above is true, I'm wondering if there's a way to improve the performance of Postgres commands like (the most important) backup and restore, in situation
of so many views. Dropping table is not that important, but would be good to have it working too, ie. by first deleting the views in batches (my idea, will test).
But backups and restores must be faster and reliable in order to implement one feature in our platform.
Perhaps adding index on views, so that it can quickly assess how many there are and to lock them, disabling something, tweaking some perf option... throwing ideas. 

Please advice. Or maybe there's no hope to make this behave better :)

Regards,
Hubert