Thanks Ron for the suggestions - I applied some of the settings which helped throughput a little bit but were not an ideal solution for me - let me explain.
Due to the size, I do not have the option to use the directory mode (or anything that uses disk space) for dump as that creates multiple directories (hence why it can do multiple jobs). I do not have the several hundred TB of space to hold the output and there is no practical way to get it, especially for a transient reload.
I have my original server plus my replica; as the replica also applied the WALs, it too filled up and went down. I've basically recreated this as a primary server and am using a pipeline to dump from the original into this as I know that has enough space for the final loaded database and should have space left over from the clean rebuild (whereas the original server still has space exhausted due to the leftover files).
Incidentally, this state is also why going to a backup is not helpful either as the restore and then re-apply the WALs would just end up filling the disk and recreating the original problem.
Even with the improved throughput, current calculations are pointing to almost 30 days to recreate the database through dump and reload which is a pretty horrible state to be in.
I think this is perhaps an area of improvement - especially as larger PostgreSQL databases become more common, I'm not the only person who could face this issue.
Perhaps an additional dumpall mode that generates multiple output pipes (I'm piping via netcat to the other server) - it would need to combine with a multiple listening streams too and some degree of ordering/feedback to get to the essentially serialized output from the current dumpall. But this feels like PostgreSQL expert developer territory.
Thanks
Tom
On 17-Jul-2024 09:49, Ron Johnson wrote:
---8<--snip,snip---8<---On Wed, Jul 17, 2024 at 9:26 AM Thomas Simpson <ts@xxxxxxxxxxxxxx> wrote:
That would, of course, depend on what you're currently doing. pg_dumpall of a Big Database is certainly suboptimal compared to "pg_dump -Fd --jobs=24".
This is what I run (which I got mostly from a databasesoup.com blog post) on the target instance before doing "pg_restore -Fd --jobs=24":declare -i CheckPoint=30
declare -i SharedBuffs=32
declare -i MaintMem=3
declare -i MaxWalSize=36
declare -i WalBuffs=64
pg_ctl restart -wt$TimeOut -mfast \
-o "-c hba_file=$PGDATA/pg_hba_maintmode.conf" \
-o "-c fsync=off" \
-o "-c log_statement=none" \
-o "-c log_temp_files=100kB" \
-o "-c log_checkpoints=on" \
-o "-c log_min_duration_statement=120000" \
-o "-c shared_buffers=${SharedBuffs}GB" \
-o "-c maintenance_work_mem=${MaintMem}GB" \
-o "-c synchronous_commit=off" \
-o "-c archive_mode=off" \
-o "-c full_page_writes=off" \
-o "-c checkpoint_timeout=${CheckPoint}min" \
-o "-c max_wal_size=${MaxWalSize}GB" \
-o "-c wal_level=minimal" \
-o "-c max_wal_senders=0" \
-o "-c wal_buffers=${WalBuffs}MB" \
-o "-c autovacuum=off"
After the pg_restore -Fd --jobs=24 and vacuumdb --analyze-only --jobs=24:pg_ctl stop -wt$TimeOut && pg_ctl start -wt$TimeOut
Of course, these parameter values were for my hardware.My backups were in progress when all the issues happened, so they're not such a good starting point and I'd actually prefer the clean reload since this DB has been through multiple upgrades (without reloads) until now so I know it's not especially clean. The size has always prevented the full reload before but the database is relatively low traffic now so I can afford some time to reload, but ideally not 10 days.
Yours, Laurenz Albe
There's no free lunch, and you can't squeeze blood from a turnip.
Single-threading will ALWAYS be slow: if you want speed, temporarily throw more hardware at it: specifically another disk (and possibly more RAM and CPU).
On Thu, Jul 18, 2024 at 9:55 AM Thomas Simpson <ts@xxxxxxxxxxxxxx> wrote: